Closed hettling closed 9 years ago
I use SequenceGetter::get_markers_for_accession to get short descriptions. This queries GenBank for the seed accession number and then looks in the feature table of that sequence for gene names. This works reasonably well for getting a short name (e.g. "COI").
If we're changing the marker table anyway it would be great if it was TSV throughout so that it displays better on github (and is easier to read in R and such).
Note that get_markers_for_accession is now called by the tree plotter, and that this is a costly operation because bioperl automatically throttles requests to genbank. Since running the plotter is a bit of an iterative process (try out different width x height dimensions) this is very wasteful. It would be better if this was done once when building the marker tables.
Both, backbone and clades marker table have now column names with the marker name of the corresponding cluster seed gi. The tables are now pure tsv without the descriptions on the bottom.
smrt-utils mlookup
does not seem to have problems with the new tables, smrt-utils plot
was updated to deal with this. I think I can close this issue now, please reopen when you encounter problems.
Markers used for inference are exported in a table by
smr bbmerge
andsmrt bbdecompose
, where each marker represents a row. The description of each marker at the bottom of the file is misleading, since only the description of the cluster seed sequence for each marker is shown; the cluster seed sequence however is often not even part of the data used and the taxon is not in all cases in our list of taxa.It would be good to parse a short, very general description for each marker.