label names in MSA view truncated

wilzbach / msa

Modular BioJS compoment for a multiple sequence alignment

http://msa.biojs.net

Boost Software License 1.0

168 stars 79 forks source link

label names in MSA view truncated #212

Closed lizaaminov closed 2 years ago

lizaaminov commented 8 years ago

My label names contain special characters such as '|' as a result the label names displayed are being truncated and only the string after last "|" is being displayed.

Is there a way to display the full label name without cutting it?

Thanks, Liza

blankclemens commented 7 years ago

+1 Especially annoying, if the sequence names end with an | - there will be no labels at all in this case.

avematthew commented 6 years ago

After looking into this because I was having the same problem, this is not a problem with this repo.

Rather, the module biojs-io-fasta which it uses to import FASTA sequences uses the character "|" to separate fields in the name in the form of label|value. The last field is taken to be the name.

So, if your sequence is named database|v1.2|1abcA01 the parser will say its name is 1abcA01 and store the value v1.2 as the database the sequence is from. This value is available in the "ids" property of each sequence.

I would take it up with biojs-io-fasta or possibly the maintainers of this repo could add a way to display the contents of "ids" so that at least the information would not be lost?

pansapiens commented 6 years ago

It looks like the FASTA parser can be configured to use a custom parsing function for the header (eg https://github.com/wilzbach/bio.io#extendcustomparser ).

eg, around line 27 of src/utils/file.js, you can change the "fasta" case to something like:

case "fasta":
    FastaReader.extend(function(header) {
        return {id: header.split()[0], name: header, details: {} };
    });
    reader = FastaReader;
    type = "seqs";
    break;

I guess this FastaReader.extend method should really be exposed as a msa option to make it easily configurable.