phyloref / klados

A curation tool to edit test cases for the Phyloref curation workflow
http://www.phyloref.org/klados/
MIT License
2 stars 1 forks source link

Export Phyx view as CSV #238

Closed gaurav closed 2 years ago

gaurav commented 2 years ago

This PR adds an "Export as CSV" button to the Phyloref summary in the Phyx view so that that table can be exported as a CSV file (see attached screenshot). The generated CSV file expands some of the fields in the visible table (see example: Alligatoridae_Alligatorinae_and_4_others.csv from the screenshot below). To implement this, I moved the download filename generation code into the store so that it could be used in multiple places in this application.

If the phyloreference resolves to an unlabelled node, this will appear in the exported CSV as "(unlabelled)" (see example CSV).

This PR also includes an unrelated fix that I noticed (getDefaultNomenCodeURI() referred to NAME_IN_UNKNOWN_CODE instead of UNKNOWN_CODE as the constant is currently named).

Closes #225.

Screenshot of the new button:

Screen Shot 2022-04-26 at 9 22 47 PM
gaurav commented 2 years ago

Nothing that looks obviously questionable here, but I don't see a screenshot of how this would appear, and I don't see an example csv that this would produce. So kind of hard to confidently give 👍🏻.

Good catch! I've updated the PR description with a screenshot and a link to two example CSVs.

gaurav commented 2 years ago

Fixed in d3f7582277f9. The file now includes a "Phyloreference ID" field -- if an @id is set, it will be used; otherwise, each phyloreference is referred to as #phyloref0, #phyloref1, and so on. See example CSV output in Alligatoridae_Alligatorinae_and_4_others.csv.

gaurav commented 2 years ago

We're not including the type of specifier, so are leaving it up to the reader of the CSV to either assume it's a taxon name, or to apply some guessing logic. However, we do know the type, don't we? Same for expected and actual. Or maybe all we're saying is that these are node labels?

Ah, good catch! That's a bug on my part -- I wrote TaxonConceptWrapper instead of TaxonomicUnitWrapper, which has the logic for displaying specimens (with the phrase Specimen [occurrence ID]) or an external reference (in the format <url>). Fixed in 41dd4cc. I can't really test this since our only current test file only uses taxon names, but I've made a note on #232 to check the CSV export once we have more complicated phyloreferences in here.

It seems that the number of internal and external specifier columns is not fixed, but dynamic depending on however many the phyloreference with the most such specifiers happens to use. I suppose that's unavoidable when flattening out to CSV. But wouldn't it be better to have the columns whose number is dynamic appear last in the column order, rather than in the middle?

In theory, the phylogeny columns are also dynamic, as you are allowed to have multiple phylogenies in a single Phyx file. If there were multiple phylogenies in Alligatoridae_Alligatorinae_and_4_others.csv, each would be listed after all the other columns with the columns [phylogeny label] expected and [phylogeny label] actual. So even if I were to move them to before the specifier columns, they would still require some processing to be interpreted correctly.

I'll go ahead and merge this PR, but please do open an issues if you think we should output the phylogeny resolutions in CSV in another format.

hlapp commented 2 years ago

In theory, the phylogeny columns are also dynamic, as you are allowed to have multiple phylogenies in a single Phyx file.

Good point. So I think the main thing left is to document the CSV format, either in plain text, or for example in LinkML, so that downstream consumers know what to expect? Should that become an issue in the tracker so we don't forget?

gaurav commented 2 years ago

Good point. So I think the main thing left is to document the CSV format, either in plain text, or for example in LinkML, so that downstream consumers know what to expect? Should that become an issue in the tracker so we don't forget?

Sounds good! I'm not sure if LinkML supports dynamic column names in this way, but we should definitely document this somewhere. Filed as #257.