toolbox4minecraft / amidst

Advanced Minecraft Interface and Data/Structure Tracking
GNU General Public License v3.0
2.14k stars 239 forks source link

Search maps for specific biomes near spawn #120

Open mgod opened 8 years ago

mgod commented 8 years ago

I noticed there have been a few requests for searching for seeds by features/biomes. These requests have generally been pointed to scripts in various forums. I'm not sure if there's a policy reason not to have these kinds of features in the core app, but I've written a prototype filter for finding seeds with specific biomes near [0, 0] and was wondering if it made sense to clean it up and open a PR for it.

I'm confident I can build the back end for this in a well structured way, but I'm unsure of what the UI would look like if I add this, so I'd want feedback on that before spending too much time on it. In it's most basic form, it seems like this could be done with users adding there own feature_filter.json similar to history.txt.

jmessenger235 commented 8 years ago

Thank you for your patience with my delayed response. I have been on vacation and your proposal took some time to think through.

However, I think that putting a score on a compound filter still makes sense if we agree that a compound filter don't aggregate the score of its children. Meaning, if a filter doesn't have a score attribute, it will have its default score of 1, regardless of what is inside it.

I think that a compound filter should aggregate the score of its children, even if they are all defaulted to 1. Scoring the compound filters adds complexity and I think it will be counter-intuitive in some instances.

There is a use case that does not seem to be well addressed by that format. I may want to search for several different criteria that have no common ground so they would have to be matched separately. In short, there is no way to or multiple different groups. There are a couple of possibilities. I think the best would be to allow references to other groups within a logic filter.

I think that report is not needed in this context. It is only moderately useful in the flat to exclude logic groups, but I don't really see much advantage to being able to exclude a group in a nested format if we take the approach of allowing groups to be referenced in logic filters. It can be implemented, I am just trying to better understand the purpose here as it is something that I would never use as I would always want the max info and would not create groups that I didn't care about.

Below I have an updated example that should allow for more clear parsing and code structure and to demonstrate what I see as the ideal solution to the above mentioned use case. Requiring groups makes the simple case a tiny bit more work, but it will make the code much cleaner if I understand things correctly. This can also be totally mitigated by having the GUI auto fill with a default template that has a group named default.

{
  //here we declare all the groups we need to create our search
  //This is always required
  "groups": [
    //I'll assume that since you proposed this, there is a way to parse these groups
    //as objects with names as this does not seem like a normal use of GSON
    //an example of how would be appreciated
    "foo": [ //a group named foo containing a biome and a structure filter
      {"biomes": [...]},
      {"structures": [...]}
    ],
    "bar:" [ //a group named bar containing a compound filter
      {"and"/"or"/"minScore": [
        {...},
        //we can still nest compound filters if we want
        { "and": [...] },
        // we can also reference another group like functions in programming
        { "group": "baz"}
      ],
      //This would only bee needed for a min score filter
      "minScore" : 4
      }
    ],
    //for convenience, this is the same as {"and": [...]}
    "baz": [{...}, {...}, {...}],

    //this notation becomes redundant as we can now place it within an and criteria
    // using a group object to reference another group
    //a bare string represents a group : this group matches if foo and bar match.
    //"quux": ["foo", "bar"], 
  ],

  //here we define the filter which must be satisfied for the seed to match;
  //it will simply be a reference to a group
  //This will only be needed if more than one group is defined
  "match": "quux",

  //the list of groups which should be reported for additional information on a match
  //if not defined, all groups will be reported
  // do we need this? (see note above)
  "report": ["foo", "baz"]
}
stefandollase commented 8 years ago

I took the time and (re-)read this complete issue as well as the associated pull-requests. I have to say, you had a productive discussion! Thanks to all of you :-)

I am currently in the process of creating a requirements document from it, that should be detailed enough to base an implementation on it. However, it is not yet done and needs some more thought. It will take me at least a few more days.

jmessenger235 commented 8 years ago

Would I be correct to assume that the requirements will be open to critique and addition?

I ask as there are several things that I listed in the secondary Google doc that will be absolutely essential to some of the searches I will be doing, but we're not really discussed in this or in the implementation(s). Primarily, the command line interface and the ability to search a seed list with the center of the search being unique on a per seed basis.

stefandollase commented 8 years ago

Indeed, the requirements will be open for discussion. Separate center points are already part of the requirements. The command line interface will soon be a part of it ;-)

moulins commented 8 years ago

There is a use case that does not seem to be well addressed by that format. I may want to search for several different criteria that have no common ground so they would have to be matched separately. In short, there is no way to or multiple different groups. There are a couple of possibilities. I think the best would be to allow references to other groups within a logic filter.

It's why I allow any type of filters in the match property. For ORing multiple groups, you simply do

"match" : {"or": ["foo", "bar", "baz"]}

The fact that groups is optional is in fact a side-effect of this decision: if you can have anything in match, you could conceivably have a filter with an empty groups object, and all the logic inside match. If groups doesn't exist, we can thus assume it's empty, and have a working filter.

Note that "group reference" strings work exactly the same as your {"group": "foo"}; you can have a group reference anywhere you can have a filter, including inside compound filters. I choose to have strings instead of objects because I find it more elegant (and shorter).

For the report property, I was thinking it would be useful to exclude "helper" groups (groups that don't represent anything but are used to better structure the .json; think of common sub-expressions). Maybe we should have an ignore property instead?

Lastly, on your implementation concerns: if the Json class has a member of type Map<String, Filter>, GSON will deserialize it properly. However, since the filters themselves can be either an object, an array or a string, we will need something more complex to parse them: I was thinking of using a custom JsonDeserializer.

jmessenger235 commented 8 years ago

Ok, those comments clarify what it is that you were going for in that structure with groups and logic. Based on the example I didn't pick up what you explained as being a valid way to use the structure. That addresses what I was looking for. All of that will need to be well explained in the documentation and well defined in the requirements so we avoid weird and unexplained behavior.

I think ignore would make more sense based on that use case.

If we used the syntax {"group": "foo"} would we be able to avoid a custom parser? I agree a string is shorter and probably more elegant, but it seems a bit inconsistent and definitely unconventional to have an array with multiple object types. Additionally, the syntax would make it more clear what is going on.

I agree, this has been a very productive discussion! :-)

moulins commented 8 years ago

If we used the syntax {"group": "foo"} would we be able to avoid a custom parser? I agree a string is shorter and probably more elegant, but it seems a bit inconsistent and definitely unconventional to have an array with multiple object types. Additionally, the syntax would make it more clear what is going on.

To avoid a custom parser, all the filters need to be objects, so we can't have the array shorthand for andfilters. And even then we still need a catch-all class to receive all the properties (like the CriterionJson class in my branch).

I'll try to write a more formal specification for the json format and its semantics during the week-end.

All in all, even if there is still a lot of things to decide (global filter parameters, etc), I'm quite happy with the "core" syntax we've come up with. :)

stefandollase commented 8 years ago

Okay, I thought I had it all figured out and I only need to write it down in an understandable way, but there is still some inconsistency in it. It occurs for two features: The score and the ability to choose a relative center point for a search area. It is relative to another successful match. The second feature has not yet been discussed in here, but I thought this would be really powerful and helpful. Lets have a look at the issue.

Given a search seed as well as a criterion from the search query, the seed searcher needs to be able to decide whether the search seed matches the criterion, as well as the location of the match. To make this decision, it cannot use any additional information, e.g. from the "calling" criterion. If it would use information from the calling criterion, then each criterion that is written down in the query would not directly correspond to one check and thus one result, but it would be more like a procedure which can be called multiple times and which returns different results for different parameters. This would make the search queries conceptually much harder and thus they are harder to write and understand for the user. I would like to not add this complexity to the search queries.


Let me make an example to clarify the above statement: Each seed has a center point for its main search area. This center point is passed as a parameter to the root criterion. This is probably an and or or criterion, so it will pass the center point further down to its child criterions. Imagine the following query:

(A || B) && (A || C)

Imagine, each of the or criterions overwrite the center point for the search, the evaluation would then look like this, where c1 is the center point for the main seed area of the search seed and c2 and c3 are the center points for each of the or criterions:

c1 -> (A || B) && (A || C)
  c1 -> (A || B)
    c2 -> A || B
      c2 -> A
      c2 -> B
  c1 -> (A || C)
    c3 -> A || C
      c3 -> A
      c3 -> C

As we can see, A is evaluated twice, with different parameters, so A is no longer a synonym for a specific match, but for a procedure that checks for a match, based on a given parameter.

BTW: I just used this syntax to keep the example small. This is not supposed to be the query syntax!

Please note, that it is not an issue to reference another criterion, so it is fine to have and and or criterions.

We can solve this issue by not passing any parameters to a criterion. However, we still want to be able to choose a custom center point. To do so, we have to specify this "parameter" directly with the criterion that uses it, so there would be two As, one with c2 as center point and the other with c3 as center point. B and C would also directly specify c2 respectively c3 as their center point. Now, we have no parameters, but only attributes, and all is fine. Lets look at the cases where this is not that obvious and easy to resolve.


The score depends on the location of the match. Thus, it is impossible to short-circuit an operation that calculates its score based on the match location, because there might always be a match with a better score. Simply using the first match would yield false information. In this case, the parameter that is passed to the criterion is a flag, whether it is sufficient to find any match, or whether a given score operation should be applied. Again, this can be resolved by specifying the score operation as an attribute of the criterion.


The part that is really hard is the relative positioning. For example, I have the following query:

A: mushroom island biome with a maximum distance of 1000 blocks from [0, 0].
B: village with a maximum distance of 200 blocks from A
C: stronghold with a maximum distance of 300 blocks from A

Query: B && C

To evaluate B and C, we need to evaluate A first. Also, A is not a parameter for B and C, but it is directly specified as an attribute of the criterion, similar to how and and or criterions work, so this is not an issue. The issue is, that A cannot short-circuit: There might be a mushroom island biome with a village and a stronghold near to it, but if A is short-circuiting it might find and return another mushroom island biome first. If the other mushroom island biome does not have a village and a stronghold near to it, then we have a false negative result. Thus, A cannot short-circuit, so we are effectively passing a flag to A whether it can short-circuit or not. As described above, we cannot pass parameters, so this is bad.


Let me summarize this:

Thus, the only criterion evaluation function has the following signature:

Optional<Match> evaluate();

No parameters and an optional match as return value. Indeed, the return value of this function can be cached (per search seed).

For the score, this works fine, because the criterion will internally find all matches and return only the best match. This works also fine for the custom center point, because it is not a parameter but an attribute of the criterion itself. However, this does not work well for the relative center points, because this feature requires the referred criterion to return all matches, which is not allowed.


Thus, the only way to get the relative center point feature is to somehow force A (the mushroom biome island criterion) to choose exactly one match. To prevent surprising the user, we should not just use the first match. We would have to add an attribute to A that is explicitly given by the user and that can be used to choose one of the matches. However, after thinking about it for a while, I think there is no intuitive way to express this. It would always surprise the user, if a seed that matches the query is rejected and there is no attribute that would change this.

Okay, that's it. However, there might be more issues with the relative center point feature that I am still unaware of, but we first have to resolve this one, I guess.

Here are my questions:

moulins commented 8 years ago

I've written a quick specification here. It's basically the format we discussed here, with some small tweaks. I didn't add all the features we proposed here, because I think we can find a better syntax for them.


@stefandollase : Your analysis for the center point and the score seems correct, but I don't see the problem, because all this "parameter passing" can be done before the search, at compile time so to speak. This has the added benefit that we can "compile down" non-trivial features into more trivial constructs to simplify the search algorithm. (for example, in my implementation I transform a biome criterion with multiple biomes into an or criterion)

For the relative center feature, I agree that such a feature would be useful, but as you said, this pose some complex problems: how to execute it efficiently? To test a criterion with a relative center, we need to conceptually search all pairs of coordinates which could be a match (or even triplets or quadruplets with more complex stuff). A naïve implementation would be unacceptably slow, and to have good performance we would need to do some clever stuff.

To be honest, I'm not sure it's doable without some (very) complex logic, so I think we should wait to have a working implementation with all the other stuff before.

jmessenger235 commented 8 years ago

I didn't add all the features we proposed here, because I think we can find a better syntax for them.

@moulins Please elaborate on that and "small tweaks"? Those things need to be discussed not unilaterally changed.

Additionally, square shape and a center of 0,0 should be the defaults for performance reasons. The spawn should not be used as center by default as it will have several performance implications. Most notable are wasted biome data that will be generated. We should also display a warning about non-512 distances even if we support them as they will also produce more data than will be used based on the comments from @stefandollase. The square should be used to avoid constant square root computations as those will bog the process down due to the nature of branch prediction and the inherent complexity of the computation.

I think waiting until the implementation is more complete to discuss this dilemma would be unwise. Stuff like this can cause too much refactoring and maybe architecture conflicts. We at least need to know where it would fit and how. We will need a way to specify a global offset for each seed (like for searching a list of QWH seeds), but everything else in the relative matching could wait on implementation until a v2. I do think that it would be worth it to eventually implement.


@stefandollase I think your analysis has a basic flaw. I think that by trying to avoid treating a criteria in a relative match set as only a single match it complicates the implementation and usability even more than it would otherwise. A simple analogy could clarify it quite well. Something like, was there a tree ten feet to my left at any point today? Anyone looking at that should easily recognize that they will have to look at everywhere that I have been today, or at least start tracing my steps, and then answer the question by checking for the tree. They don't have to find every tree, but they do need to find at least one that was ten feet to the left. That analogy maps well to biome area and structures like in your example. I don't think that is too challenging to understand and I think, using that, we can look at the code to see a viable solution to the problem.

In the current implementation of the nested version by @moulins , the world filter acts as an administrative entity to match the different criteria. Essentially, it sends a each and every individual point to all of the relevant criteria. I think that we can use that general architecture to solve the problem in one of two ways or maybe both depending on the user needs. A way to specify which one to use might even be helpful. In order for any of these to work reasonably though, we are going to need to write an intelligent biome and structure cache that is managed by the world filter to allow for evaluation of relative criteria in a manner that will not require data regeneration. Additionally, I think we need to implement the search so that it spirals/patterns out from spawn in a distance sensitive manner. That way, the first match will be the closest.

Using your example of:

A: mushroom island biome with a maximum distance of 1000 blocks from [0, 0].
B: village with a maximum distance of 200 blocks from A
C: stronghold with a maximum distance of 300 blocks from A
Query: B && C

One solution is to have the world filter send a point to the mushroom biome criteria for evaluation, then upon failing the world filter moves to the next point. If the mushroom criteria evaluates to true though the world filter detects that there are dependent criteria and then evaluates B & C based on that point (generating additional data into the cache if needed). If that evaluation fails, it will do the same for each point until all area is searched or a match is found. This could be optimized to only check the needed regions by eliminating the already searched areas via bounding box.

The other solution would be to generate a bounding box for any contiguous section of mushroom biome and then perform the evaluation of B & C in a bounding box increased by the radius of B or C respectively. This has some better performance, but is definitely less accurate.

If anything is not clear, please let me know and I will do my best to clarify.

moulins commented 8 years ago

@jmessenger23: I added/modified:

I didn't add:


I don't see how a circular area will hurt performance compared to a square area: the circle is completely contained by its corresponding square, so it won't generate more biome data than a square area (even if it wastes some of it). For the cost of checking the intersection with the circular area, I don't think it has much of a performance impact:

  1. I don't calculate the distance per se, instead I compare the squares directly, which is much less costly.
  2. The branch pattern should be fairly predictable, so again I don't think it will be that costly. But to be sure, we should make some performances tests.

I don't like putting warnings when using non-512-aligned distances and centers, however I do think we should mention the performance cost of non-aligned searches in the documentation. (Or maybe we could eliminate it by only generating the smallest bounding box in each fragment.)


I think waiting until the implementation is more complete to discuss this dilemma would be unwise. Stuff like this can cause too much refactoring and maybe architecture conflicts. We at least need to know where it would fit and how. We will need a way to specify a global offset for each seed (like for searching a list of QWH seeds), but everything else in the relative matching could wait on implementation until a v2. I do think that it would be worth it to eventually implement.

What I mean by "don't implement it yet" is that we shouldn't implement it at the same time as the other features, even if we do need to think about it when deciding the code architecture. However, I don't understand your "global offset" feature. Does the global center point I added fit the bill or is it something else?

jmessenger235 commented 8 years ago

Cluster doesn't cover all for the use cases well. It is actually a very nice way of implementing quad witch hut searches or comparable, but it does not cover the case of wanting 3 villages within a given area very well. It makes you add extra data to the JSON and will result in more checks than needed. It also removes the ability to specify a minimum biome area. In short, cluster is nice but we still need minimum. The other changes are fine by me.

The use case for the max radius is to prevent having separate duplicate filters for different ranges of the same criteria. This is something I often use in my criteria heavy searches to help allow for slightly variant matches that meet the same intent while scoring them to help identify them better. Do you have a suggestion on how to improve the syntax based on that use? I'm open to ideas as what I suggested was pretty much off the top of my head.

As to shapes: It is not so much that circle will be costly compute wise as you have a decent optimization I missed, although it will have some cost on that account and it may be faster overall for large search radii by eliminating the generation of the corners. It is more that it will likely be un-intuitive as a good bit more data than the circle will be generated. After some further thought on the issue, I'm inclined to say we need a third option of stepped circle. Essentially, we use a circle and generate and search any segment that intersects the circle rather than checking each point for intersection. This has the advantages of both as I see it. You search all of the data generated so nothing is "wasted", you eliminate the extra squaring calculations, however you get the advantages of a circle by excluding the far reaches of the corners. As to the default, we can leave that till after benchmarking.

Thank you for clarifying your statement. I apologize for my misunderstanding.

Based on what it looks like the global center should be adequate as long as it can be changed for each seed when searching a seed list. This is mainly for searching quad witch hut lists. This let's you center the search on the center of the huts to get better results.

moulins commented 8 years ago

Cluster doesn't cover all for the use cases well. [...] It does not cover the case of wanting 3 villages within a given area very well.

You're right, I've overlooked this usecase. If size was optional (with the default being no limit on the distance), it should resolve the issue, no?

It is more that it will likely be un-intuitive as a good bit more data than the circle will be generated.

I don't see the problem with that; after all, it's purely an implementation detail, and the user shouldn't be even aware of the extra data (which is only ~10% maximum of the total biome data anyway) generated under the hood.

For the global center point, there is a potential issue I didn't think of: all centers are defined with absolute coordinates, so changing the global center doesn't shift the whole search. Is it a problem for your usecase or not? If it is, we could interpret the centers as offsets; for example, with a global center of [100,200], a criterion with a center of [50,500] would search around [150,600].

jmessenger235 commented 8 years ago

If size was optional and removed the distance limit, cluster would be adequate, but I still don't think that it can replace minimum as a way of implementing that use case. Minimum is simply easier to use and understand. Cluster increases the JSON you have to write and obfuscates the usage by comparison.


I don't see the problem with that; after all, it's purely an implementation detail, and the user shouldn't be even aware of the extra data (which is only ~10% maximum of the total biome data anyway) generated under the hood.

There are two problems. One, I have no idea where you came up with the ~10% extra data max. Two, it violates the intuition that searching the smaller area of a circle will have an improved performance proportionate to the reduction in area. Since a square is 27% larger than a circle I would expect the search to be speed up proportionally and can assume that at least some others will think similarly. However, this will not be the case as up to radii of 2048 we will have to generate all of that data (due to 512 chunk generation) and perform more complex region checking resulting in what will likely be a slower search for radii of less than 2048. In the improved case of radii larger than 2048 where we can start taking chunks out of the corners, however, we still only reduce the biome data generated by less than 7%, but still have the increased computation of the circle edge checking, which might still make it slower. If a user then realizes that there is such a negligible advantage, they may go with a square search for a faster but perhaps slightly more approximate match.


The absolute offsets could be a significant problem. The reason I referred to your global center as a global offset is because it needs to offset all of the criteria. Essentially the global offset/center is relative to 0,0 and then any other offsets are based on that new center. Thus a global center of [100] and a criterion with a center of [50] would search around [150]. We may need a way to specify your case though. We would likely need a relative to parent match for the relative matching we were discussing, an absolute one as in your implementation, and a relative to global center/offset that I would think would be presumed as default.

moulins commented 8 years ago

Sorry for the delay.

If size was optional and removed the distance limit, cluster would be adequate, but I still don't think that it can replace minimum as a way of implementing that use case. Minimum is simply easier to use and understand. Cluster increases the JSON you have to write and obfuscates the usage by comparison.

The thing is, I don't like having both minimum and cluster (they implement very similar functionalities), and having two separated attributes for number and size seems also weird.
With the cluster syntax, your "village" use case is :

{
  "structures": ["Village"],
  "radius": 1024,
  "cluster": { "number": 3 }
}

Personally, I don't find it overly verbose nor unintuitive.


Two, it violates the intuition that searching the smaller area of a circle will have an improved performance proportionate to the reduction in area.

Huh, it seems we don't view things the same way here. Barring any implementation constraints, I find the circular search to be the most intuitive one. After all, the corners of a (2x1024)-square are NOT 1024 blocks away from the center! As long as circular searches are faster than their corresponding square ones, I would favor them. (I don't believe the distance check for a circular area have a noticeable impact on performance, but again we need to benchmark here)


The absolute offsets could be a significant problem [...]

As I don't see useful cases for absolute centers, I've removed them completely: all centers now are relative to their parent's.

Liz4v commented 8 years ago

Personally I prefer a circular search. One that would still search through all of the generated border chunks, and not only inside the declared circle. This gives an interested user a very decent shortlist that can be further examined.

I understand the worry about performance, but the borders (and shape thereof) of the search area are by definition one power below the asymptotic complexity of this search, and therefore not the performance bottleneck.

moulins commented 8 years ago

@ekevoo : The thing is, the "chunks" used by our implementation aren't the Minecraft 16x16-chunks; we generate the biome data in 512x512 fragments, then we search each of them. In the case of 16x16 fragments, having a match a few blocks away from the circle isn't a problem, but in our case, without the distance check, we could have matches a few hundred blocks further than the intended radius.

Liz4v commented 8 years ago

I see. I would still be interested in such a search, though. A normal village is at least 150 blocks on one side, abandoned mines much bigger. For clarity, there could be an additional option of whether or not to discard generated structures outside the radius.

jmessenger235 commented 8 years ago

@moulins No worries, I still hold the longest delay. :-)

Please clarify what you mean by: having two separated attributes for number and size seems also weird. I'm not seeing how it is relevant to my proposal for use. However, it gave me an idea, why don't we change cluster from an object to an integer like radius and put both properties in the filter object itself? We could rename cluster to clusterRadius or similar. Then we have the features and intuitiveness of both without the extra object and syntax.

For example:

{
  "structures": ["Village"],
  "radius": 1024,
  "number"/"minimum": 3
  //optional
  "clusterRadius": 256
}

Is your implementation such that we could run benchmarks in it's current state? I ask as we need to settle the performance part of this as that will impact the decision.

We are both looking at the intuitiveness of two separate aspects. You are looking at the difference between the radius and the hypotenuse of the different search area shapes. This argument has some merit as we do refer to the distance as a radius. I think that since we are working in a world that is built of blocks a radius referring to a square is not too far fetched. In short, I'm inclined to say this isn't too big of an issue.

I on the other hand am looking at why the user would choose square vs circle and seeing the what will be a large chunk of wasted data that I think most users would rather have searched in spite of the distance discrepancy simply as the performance is so minor, if not negligible. Hence the approach of a jagged circle being a really good middle ground. You reduce the generated data in larger radii and exclude the corners that start getting a prohibitively large hypotenuse/radius ratio, but you search all of the data generated and don't ignore the matches in the smaller searches where the hypotenuse/radius ratio is not a big issue.

@ekevoo Do I understand you correctly then that even with a "chunk" of 512 you would prefer that it search the chunks that intersect the circle or at least have it as an option? For example as in the 1024 and 2048 radius searches here.


Use case discussion, we need a new feature... In thinking about uses for the absolute offsets, I thought of a use case. What if you wanted to find a point on a seed that matched all of the criteria that you specified? We don't have that planned, but for players on large multi-player servers, I would think that would be quite useful for those that have the seed.


@moulins Have you given any thought to a syntax that you like better for scaling the score as it relates to distance?

moulins commented 8 years ago

Please clarify what you mean by: having two separated attributes for number and size seems also weird.

By that, I mean exactly what you propose below: your attributes minimum and clusterRadius are completely separated, and nothing indicates that they go hand in hand. If you absolutely want to eliminate the extra object, we could have clusterMinimum and clusterRadius, but I don't think it's better.


The reason I don't like the "jagged circle" isn't really because it may include points which don't lie inside the "true" circle, but because the rule for including them is non-intuitive (move the search center by a few blocks, and you may double the potential area).

Is your implementation such that we could run benchmarks in it's current state? I ask as we need to settle the performance part of this as that will impact the decision.

You're right, I should've made the benchmarks sooner.

Here are the results:
I used a Large Biomes world with seed -7270480555011518541; the fragment from [0;-512] to [512;0] contains nothing but Ocean and Deep Ocean. I used this filter:

{
  "continuousSearch": false,
  "filters": [
    {
      "type":"biome",
      "radius": 4,
      "center": [502, -10],
      "square": true/false,
      "biomes": ["Ocean", "Deep Ocean"]
    }
  ]
}

This filter has three properties, which make it the "worst case" scenario in regard to the performance impact of the distance check:

  1. The search area doesn't cross fragment boundaries, so only one fragment will be generated.
  2. As the fragment only contains Ocean and Deep Ocean, all points inside will be checked against the search area.
  3. The search area is very small, and in the lower-right corner (high-x and high-y), to force AMIDST to test almost all points inside the fragment.

In average (over 1000 iterations), the square version takes 1.41ms, the circular one takes 1.42ms: i.e. no difference between the two.


Use case discussion, we need a new feature... In thinking about uses for the absolute offsets, I thought of a use case. What if you wanted to find a point on a seed that matched all of the criteria that you specified? We don't have that planned, but for players on large multi-player servers, I would think that would be quite useful for those that have the seed.

I'm sorry, but I don't understand you here... you'll have to be clearer ^^


For the progressive score syntax, I was thinking of putting the scale information either inside score or inside radius (in both cases, they would become a compound object, and the current syntax would become syntaxic sugar for the compound object), but I'm not yet sure it's a good solution.

jmessenger235 commented 8 years ago

I would say the reason that there is nothing connecting them is because of the nature of their relationship. They don't go "hand in hand" per say. minimum does not affect clusterRadius but clusterRadius defines how minimum is understood in one of the two use cases. My main objection to your proposal is that the cluster object makes the case of minimum longer to write and shoehorns it into a syntax that is designed for clustering and therefore doesn't make sense for minimum type searches. I think the best is to just support both minimum and the cluster object. Adding a cluster radius and minimum to replace the object was just me trying to find a suitable middle ground.


I agree that a jagged circle is not intuitive and should probably not be default, but I would still like to have it as an option for users that understand the implications and are willing to work with them.

I want to run some more benchmarks as those don't answer the questions that I have adequately about the differences on the finer points of generation. I am more interested in simulated use cases over sample sizes in the realm of 10k-100k seeds. The tests you did are adequate to indicate at a preliminary level what it is that I said about the lack of performance improvement for the circle with square actually being a bit faster (~1%).

I think the whole discussion would be better made moot by allowing the user to specify defaults for some core settings globally. Defaults like the shape of the search area, the minimum biome area, minimum number of structures, and score scaling. If we implemented that, I would care orders of magnitude less what we default the search shape to.


What if you wanted to find a point on a seed that matched all of the criteria that you specified?

I'm not sure how to better communicate that, but I'll give it a go. Let's say I play on a server and I have the seed for that server. Now let's say I want to find a really good spot for my base and I have a couple of criteria in mind. I can't really do that with the current planned implementation. What I would like to add (or at least consider) is allowing the user to search a set of criteria against each coordinate in a very large radius of a single seed. Then you can get a list of coordinates (or an average point with an area of the viable coordinates) that would meet that criteria on that seed so you can pick a location for your base.


That might be viable, I would be interested in a more concrete syntax.

What would you think of reversing the roles in a sense. Something like an optimalRadius that would be the smallest distance at which the score would be highest. That way radius would stay consistent as the furthest point to look. Then even with the default score of 1 the feature still works and radius is consistent.

moulins commented 8 years ago

In light of your comments, I've added a "defaults" root attributes to specify defaults for all criteria, and I added a shorthand for the cluster syntax (here).


I'm not sure how to better communicate that, but I'll give it a go. [...]

Ahh, I see what you mean :) Tell me if I'm wrong, but it looks like a special case of a relative search, doesn't it?

If we write the example used by @stefandollase as such

Find a Mushroom Island biome within 1000 blocks of [0, 0] (called A), such as
   - there is a village within 200 blocks of A
   - there is a stronghold within 300 blocks of A

We can write your single-seed search as

Find a point within [very large number] blocks of [0;0] (called A), such as
   - there is such and such within X blocks of A
   - ...

Thus, implementing @stefandollase 's relative search would instantly make your use case possible.


If we "reverse the roles" as you say, I think we can go further by getting rid of optimalRadius. The score scaling would be controlled by a single parameter, (let's call it scale for now), which would be the maximum bonus, given when the criterion matches exactly at the center of the area.
This give us the following formula for the score:

//we clamp at 0 to avoid having negative values
//when matching in the corners of a square area
score = baseScore + scale*max(0, 1 - matchDistance/radius)

For example, if we have a search with a radius of 1000 blocks and a scale of 100, a match at 400 blocks of the center would give a bonus of 100*(1 - 400/1000) = 600.

For the actual syntax, we could have either

{
  ...
  "radius": 1000,
  "score": {
     "base": 10, //the original score value
     "distance": 100 //the scale parameter
  }
}

or

{
  ...
  "radius": {
    "max": 1000, //the original radius value
    "bonus": 100 //the scale parameter
  },
  "score": 10
}
jmessenger235 commented 8 years ago

I looked over your JSON spec and noticed a couple of consistency things such as shape not appearing on structure criteria, and jagged circle being a missing shape option. Would edits on those items be welcome? For clarification, are edits in general welcome, either for language, grammar, features, or content?

We really should also add spots for setting the defaults for the minimum biome area, minimum number of structures, and score scaling like I mentioned in the previous comment when we figure out how to express them.

If you insist on trying to shorthand cluster like that, change it to collection, size, and radius for the object name and properties. The shorthand as it stands is horrible for English comprehension as the concept of cluster and minimum are so woefully unrelated. At least collection, size, and radius would make sense in both cases from a linguistic standpoint.

As a secondary, note, I'm not too particularly fond of creating a syntax that needs a custom parser. It places an increased reliance on our custom parser which will result in JSON that is technically standards compliant, but in reality is highly non-standard due to the type variance. That makes it much harder to integrate with other tools and editors/parsers. It increases code complexity and it also can make backwards compatibility a much larger challenge if something needs to be changed.


Glad I was able to clarify :)

If I understand the proposal @stefandollase put forward, it is not quite the same. The problem is that one criteria needs to be set as the center match. To understand the problem, think about a square or triangle. The corners are much closer to the center than to each other. Because of that, it makes it harder to find precise matches without increasing distance and therefore matches that are sub-par. Nevertheless, it is close and would certainly make the case easier to implement.

Speaking of the relative match feature, @stefandollase were our comments and discussion enough to resolve the issues you were having with the relative search or is that still unresolved?


Based on that suggestion for score scaling, I would counter propose:

{
  "radius": {
    "max": 4096,
    "optimum": 1024
  },
  "score": 10
}

//clamp to 0 for corners of squares and to correct values less than optimum distance
score = baseScore *  max(0, 1 - (max(0, (matchDistance - optimumlDistance))/(maxRadius - optimumlDistance)));

Keep in mind that for this to be remotely acceptable it will need a shorthand for a plain radius. As mentioned above, I have objections to the shorthand if they result in custom parsers, but this is given presuming those objections will be ignored or that a custom parser will not be required.

This counter proposal is to address what I consider to be two weaknesses in your proposal. Namely the multiple occurrences of of a score type value and the inability to say that under a certain distance I really don't care to have the score adjusted. The combination of equation and syntax resolve those. As a result of this, optimum can be left out if desired and score could be left out as well, but a setting to default to use scaled score would be needed. The advantage of that is that you can take advantage of the feature with negligible effort as all of the filters will be default to a score of 1 as previously discussed.

jmessenger235 commented 8 years ago

@stefandollase sorry we keep moving the target on your specs :)

moulins commented 8 years ago

I looked over your JSON spec and noticed a couple of consistency things such as shape not appearing on structure criteria, and jagged circle being a missing shape option. Would edits on those items be welcome? For clarification, are edits in general welcome, either for language, grammar, features, or content?

If you look closely, I say

All the biome criterion properties' are usable in a structure criterion.

in the list for the structure criterion. Should I make it clearer? Feel free to edit whatever you want; I would like to have a "central" reference which will be used by the implementation, but as there is no implementation yet... (I would like to start implementing the parser this weekend)

For your others remarks, I agree that the names I chose weren't very clear: I changed number and size as you proposed, but I still prefer cluster (I find collectionawkward). I also added "circle_nocheck" and "square_nocheck" to the shapes list. The _nocheck suffix means that we omit the per-point distance check.

We really should also add spots for setting the defaults [...]

I added the "defaults" property exactly for this purpose.

On your complexity concerns for the parser, I think it's the price to pay if we want a compact syntax. I foresee that the majority of the complexity will be to infer the correct criterion type from the fields, and as neither of us want an explicit "type" property, we don't really have a choice.


The problem is that one criteria needs to be set as the center match.

There is no syntax for it yet, but if we put a dummy criterion which always matches to act as the center point, shouldn't it be enough? (Notice how in my pseudo-syntax there is no constraints on the point A other than the radius)


I quite like your suggestion, but you've removed the ability to have a base score which is not affected by the scaling (with your syntax, a match right at the border will always have a score of 0).

jmessenger235 commented 8 years ago

I missed the structure referencing the biome properties. It was too late here last night when I was reviewing it. In referencing all of the biome properties that implicates variant. What use does that serve in the structure case? My inclination would be to restructure the properties into common ones and then ones unique to structures and biomes in a separate section.

I agree collection is perhaps a bit awkward in the case of the cluster use case, but don't you think it is a more balanced term that fits both cases better from a meaning standpoint even if it's not perfect for both?

I added the "defaults" property exactly for this purpose.

I understand, it is just more that we will need to add them at some point.

I agree the complexity is probably needed, I mainly mention it as a voice of concern for consideration. I have had to do maintenance on several large parsers in the work that I do and that type of stuff for shorthand and multi-expression has been the bane of my existence.


Having something that matched each point would solve that issue. I wondered if that was you meant by that, but I wasn't sure so I pointed it out so we could clarify.


I am okay with having removed that, but if we wanted to leave the base score alone so it wouldn't be zero, we could simply set it to baseScore + my previous equation. Then the range would be from baseScore to 2xbaseScore depending on distance. We could allow the user to use either equation based on a setting in the defaults section. That said, we will need to make sure that the code for the and is adjusted appropriately so that scaling the score still makes it behave like an and rather than a min score.

jmessenger235 commented 8 years ago

@stefandollase any progress on the specs?

stefandollase commented 8 years ago

@jmessenger235 Thanks for your patience! It has been three weeks since the last activity on this issue. However, when the conversation was going again, I decided to pause writing my requirements document and to wait until you slow down again, so I can think about and include the new suggestions. However, since then I did not get any time to think about this again, so there is currently no progress to report.

Indeed, I want to take the time and write a good requirements document, but I need enough time to do so. It does not seem to make sense for me to just quickly put something together, because this will break down in the future. To prevent these problems I want to take the time to do it right, but since I don't have the time for it, it currently has to wait, unfortunately. I hope, you can understand this?

jmessenger235 commented 8 years ago

As someone in college and working I totally get the time thing and I appreciate your dedication to quality and documentation!

What would you think of myself and @moulins trying to work together to put a requirements document together for your review? From your previous comments it sounds like you have a start that would give us a clear idea what you were thinking of for level of detail and format. I have professional experience with dealing with requirements so I would think, when coupled with moulins' capabilities, that it would let us help you as much as possible with the time issue and help get the feature(s) released much sooner.

@moulins do you have professional development/programming experience that includes software requirements?

moulins commented 8 years ago

@jmessenger235 : I'm still in college, so I don't have any real experience with drafting a document like this. I do have a few personal projects, which obviously have requirements, but I never bothered to write them down in full detail.

@stefandollase : I don't know what your specification looks like, but I've made this page, which regroups all the features we discussed, except for the relative search and the details of the score system (these features are either complex or not fully fleshed out, so I omitted them for neo). Maybe you could use it as a starting point?

stefandollase commented 8 years ago

As someone in college and working I totally get the time thing and I appreciate your dedication to quality and documentation!

Thanks! It is nice to hear some appreciation :-)

What would you think of myself and @moulins trying to work together to put a requirements document together for your review?

I think we should do this, since I am not sure when I get the time to really think about this again. Are you fine with this, @moulins?

I propose the following process: I create a pull-request which includes my requirements document. Also, I give both of you access rights to the main repository (please don't abuse this!). Then, you can discuss changes in the pull-request thread and directly commit to my pull-request. I think this will make this task much easier for you. What do you think about it?

Nevertheless, I will need to look into my requirements document and make sure that the current state is at least understandable for you, so this will again take some time, but hopefully not too long.

@moulins I looked into your wiki page. I think it is a good starting point for a documentation page. However, the requirements document should be more detailed and work out all the hidden issues, complexities and ambiguous semantics.

jmessenger235 commented 8 years ago

I would be willing to follow whatever work flow that you would prefer. I am not familiar with committing changes to a pull request directly, but I'd love to learn! If the pull request was just for the requirements document, then that should work. If it was for requirements and code, it could get messy. I would certainly not abuse write access and would ask when in doubt. With great power comes great responsibility. :-)

stefandollase commented 8 years ago

@jmessenger235 I just created #228 which contains the draft of the seed search requirements. As you can see, I struggled to find a readable format for it (I never actually wrote down a requirements document like this). Please, feel free to give it a format that you think is readable and useful. Also, I just gave you write access to the repository, so you can directly commit to #228.

@moulins What do you think? Are you fine with helping @jmessenger235 to refine the requirements? If this is the case, I am going to give you write access to the repository, so you can directly commit to #228 (Please, don't abuse it!).

stefandollase commented 8 years ago

In my draft requirements files, I also found some prepared answers to old comments in this issue. It seems like I never actually posted them. Sorry for that. Here they are:

@alchimystic Amidst actually uses the biome generator from the Minecraft jar file. We are able to find the necessary classes and methods in every version of Minecraft. For more information see the classes DefaultClassTranslator, LocalMinecraftInterface, LocalMinecraftInterfaceBuilder as well as the package admist.clazz.

@jmessenger235 Regarding the WorldFilter class names: The previous names implied another inheritance hierarchy, which I found very confusing. I agree, that the current names are not nice, but they are more descriptive. If you can think of other names that are equally descriptive and less ugly, I am happy to hear them :-)

Also, I checked the remaining TODOs in the code, they are really not that important. I guess I had more of them while I implemented the code and then slowly worked them off one after the other.

moulins commented 8 years ago

@stefandollase I'm ok with it. :)

stefandollase commented 8 years ago

@moulins Nice to hear this. I just added you as a collaborator :-)

jmessenger235 commented 8 years ago

Based on #225 we should add a feature to allow for capturing a png of an area for matching seeds. The best way to see this done in the syntax will take some thought. It will also take some flushing out in the specs to allow for precise definition of behavior.

Bexley75 commented 7 years ago

curious on the status of this one?

I've taken a quick look at what is there and it seems ok, but for some reason it never seems to find any of the Temple/Witch Hut/Igloo/Jungle Temple structures anymore?

stefandollase commented 7 years ago

There is still a lot of work to do until this feature is part of a release. I currently don't have the time that would be required, so I focus on more important maintenance tasks like fixing bugs and adding support for new Minecraft versions. I don't know the reason why the other developers stopped working on this.

Bexley75 commented 7 years ago

oh, ok thanks for the reply, I've been playing around with it today, haven't been able to figure out the structure problem yet tho, I just haven't seen a structure match the allowed biomes yet, it never seems to...

works well at finding the biomes tho, and its fairly fast, I've scanned a million or so seeds in a few hours... it def beats any of the options out there... have found some great seeds already :)

might spend some more time on it if I get a chance, that's if I don't lose myself starting fresh on one of these seeds :)

Bexley75 commented 7 years ago

was fairly easy to fix, it was just doing the final match on name instead of label and not matching... also added in the ability to search for clusters of structures (e.g 4 which huts within a given distance)

works great!!!

stefandollase commented 7 years ago

I am happy to hear that the problem is solved. Maybe, you can create a pull-request for future reference, so we can have a look at it when we actually continue to work on this? However, I should let you know, that there have already been several different implementations of the seed search feature. The biggest time sink was to figure out the differences, advantages and disadvantages of each approach and to unify them to a single implementation. Adding yet another approach would probably not help, however to me it sounds like you did only minor changes so I would like to have a look :-)

jmessenger235 commented 7 years ago

169 fixed the name label issue already although it has other issues since I didn't keep it up to date with the current implementation. That has been an ongoing bug. Life created a perfect storm and I didn't have the time to work on this. @stefandollase I presume that if we were to implement this that we would still be following the same path and picking up where we left off by completing the DDs on #228? @moulins do you still have any interest on working on this? If not, what is the state of your fork as far as features were concerned?

stefandollase commented 7 years ago

@jmessenger235 Good to hear that it was already fixed :-) Indeed, I would like to stick to the process as described in #228. Nevertheless, my amount of free time for the project has not changed, so I will still go at a slow pace :-/

Bexley75 commented 7 years ago

I did have a read of those documents, def. some interesting ideas in there, and some I'd be interested in.

The search being based around the spawn kinda sucks, I think I've lost a tonne of seeds that meet my search purely because the bit I'm looking for doesn't have to be at spawn.

@jmessenger235 I looked at your branch and just stole the bit for the quad which huts. Instead of being hard coded to 4 witch huts though, I re-wrote your function to work on any size and distance (from the Json) and so it can apply to any structure. I've attached the file if you are interested in the changes.

I also added the ability to start with a seed and just increment, I think that's more practical than random, because over days of searching random is potentially repeating seeds and at least by reporting back the last seed I can pick up where I left off...

I'm considering looking at this some more because I don't think I am going to find the seed I am looking for this century. I think the biggest problem here is searching massive maps vs searching a massive number of maps...

@stefandollase the bug fix was just to change return new NameFilteredWorldIconCollector(structure.getName()); to return new NameFilteredWorldIconCollector(structure.getLabel()); in that same file...

WorldFilter_Structure.txt

jmessenger235 commented 7 years ago

@Bexley75 what are you looking for as far as criteria? I'm interested in making sure we have another use case in writing the docs and possibly helping optimize.

Keep in mind that in the docs that I drafted my code for quad structures is horrible perf wise and the structure code has a possible perf bug that I need to fix in a final implementation. If that plays into it there are better ways to find a similar requirement and I can point you in that direction. Also realistically, due to spawn algorithims you won't find anything denser than a quad without significantly changing the structure max distance because you can only get one temple structure per 512x512.

stefandollase commented 7 years ago

@Bexley75 I now understand the issue you want to solve by changing the name to the label. However, I think there is still a problem: The label is essentially just a text that is displayed to the user. This text might change in the future and changing the displayed text should not break existing search queries. Considering internationalization, the text might not even be the same for different users in the future. Instead, we should probably add some kind of type to the world icons, similar to the name of the DefaultWorldIconTypes. However, I explicitly chose not to do this, but I don't remember why :-/ I will have to look into this.

Bexley75 commented 7 years ago

so that was my first thought @stefandollase it doesn't look right that its matching label, it needs to be more consistent, however I just wanted to get it working so didn't dig too deep...

@jmessenger235 i'm looking for a seed for my next survival map for the next release... but want to do a quad witch hut because I haven't before, so what i'd really like to do is find one with certain biomes and a end portal etc close by... I don't care if its near spawn as I am happy enough to move the spawn location.... it makes me think of nested filters, where you centre a search based on the results of a match... i'm searching 4096 for the hut atm, but obviously going bigger with multiple filters isn't really going to work...

I like coding for games as much as playing tho, so i'm tempted to contribute here if you guys are looking for help, i'm a c++/c# coder mainly, I only code java for minecraft related stuff... the problem of searching is interesting and I've like this app since I first found it...

I was looking around at whats out there and ppl are parsing Amidst images with python scripts, i'd rather go to the source :)

stefandollase commented 7 years ago

@Bexley75 I am always happy to see new contributors :-) Please have a look at Supporting the Development and Setting up the Environment.

Regarding the implementation of this specific feature: Please have a look at #228. We currently mainly work to get this done. As I said above, adding just another implementation that is mainly different in the one or the other detail does probably not help to get this feature in a release in the near future. This is because it is hard to merge different design decisions when they are obfuscated by an actual implementation. Instead, we should focus on #228 to merge the different design decisions directly.

Regarding the question whether it is fine to only search near the world spawn point: I think it is fine to do so. The search time should be linear with the size of the searched area, so if you check an area that is twice as big in each map, the number of searched seeds should be cut in half. I remember that we planned to make the size and location of the search area configurable, but you will have to look at #228 for the details about that.

If you want to make changes to the documents in #228, please let me know so I can add you as a contributor to the project.

moulins commented 7 years ago

Hi, it's been a long time since I was here last time! I've been busy with my studies until recently, so I haven't done much work in my fork (other than keeping it up-to-date), but I'm still motivated to work on this.

Feature-wise, the only thing in my branch which works is the JSON parsing (with this format), and some QoL changes which should probably be in a separate pull request (for example, coordinates don't use longs anymore, since they're inside [-30,000,000; 30,000,000] anyway).

I'll re-familiarize myself with the code this week, and start laying down the skeleton implementation soon after (I have a rough idea on how to do it, but won't know if it'll work until I start).