naturalis / supersmart

Self-Updating Platform for the Estimation of Rates of Speciation, Migration And Relationships of Taxa
MIT License
17 stars 5 forks source link

Better description of markers in marker table #70

Closed hettling closed 9 years ago

hettling commented 9 years ago

Markers used for inference are exported in a table by smr bbmerge and smrt bbdecompose, where each marker represents a row. The description of each marker at the bottom of the file is misleading, since only the description of the cluster seed sequence for each marker is shown; the cluster seed sequence however is often not even part of the data used and the taxon is not in all cases in our list of taxa.

It would be good to parse a short, very general description for each marker.

rvosa commented 9 years ago

I use SequenceGetter::get_markers_for_accession to get short descriptions. This queries GenBank for the seed accession number and then looks in the feature table of that sequence for gene names. This works reasonably well for getting a short name (e.g. "COI").

If we're changing the marker table anyway it would be great if it was TSV throughout so that it displays better on github (and is easier to read in R and such).

rvosa commented 9 years ago

Note that get_markers_for_accession is now called by the tree plotter, and that this is a costly operation because bioperl automatically throttles requests to genbank. Since running the plotter is a bit of an iterative process (try out different width x height dimensions) this is very wasteful. It would be better if this was done once when building the marker tables.

hettling commented 9 years ago

Both, backbone and clades marker table have now column names with the marker name of the corresponding cluster seed gi. The tables are now pure tsv without the descriptions on the bottom. smrt-utils mlookup does not seem to have problems with the new tables, smrt-utils plot was updated to deal with this. I think I can close this issue now, please reopen when you encounter problems.