Closed lizaaminov closed 2 years ago
+1
Especially annoying, if the sequence names end with an |
- there will be no labels at all in this case.
After looking into this because I was having the same problem, this is not a problem with this repo.
Rather, the module biojs-io-fasta which it uses to import FASTA sequences uses the character "|" to separate fields in the name in the form of label|value. The last field is taken to be the name.
So, if your sequence is named database|v1.2|1abcA01 the parser will say its name is 1abcA01 and store the value v1.2 as the database the sequence is from. This value is available in the "ids" property of each sequence.
I would take it up with biojs-io-fasta or possibly the maintainers of this repo could add a way to display the contents of "ids" so that at least the information would not be lost?
It looks like the FASTA parser can be configured to use a custom parsing function for the header (eg https://github.com/wilzbach/bio.io#extendcustomparser ).
eg, around line 27 of src/utils/file.js
, you can change the "fasta"
case to something like:
case "fasta":
FastaReader.extend(function(header) {
return {id: header.split()[0], name: header, details: {} };
});
reader = FastaReader;
type = "seqs";
break;
I guess this FastaReader.extend
method should really be exposed as a msa
option to make it easily configurable.
My label names contain special characters such as '|' as a result the label names displayed are being truncated and only the string after last "|" is being displayed.
Is there a way to display the full label name without cutting it?
Thanks, Liza