inaturalist / iNaturalistMLWork

0 stars 0 forks source link

Adapt geo model threshold and eval scripts to accommodate inner nodes #60

Open alexshepard opened 5 months ago

alexshepard commented 5 months ago

Once the geo models are trained with data at inner nodes, the threshold and eval scripts will need to get updated to understand that training data for inner node x needs to account for the data that's included in the export at child node _xy, since that's how the export data is organized. We can't use the geo models without thresholds so this is blocking #32 .

Patrick advised:

If we generated nested set values for all taxa and injected those into the train_df_h3, or had them in some other structure that we could query, we could query instead of train_df_h3[train_df_h3.taxon_id == taxon_id], something like train_df_h3[train_df_h3.left >= taxon_left and train_df_h3.right <= taxon_right] . Or query the other structure for all taxa with nested-set values between the target taxon nested set left and right, then query where train_df_h3.taxon_id is in that potentially very large array of descendant IDs. Not idea how performant that would be. ModelTaxonomyDataframe already generates nested set left and right values for the taxonomy so that could be queried to get the IDs of descendants, or those data copied into the train_df_h3 and queried there.