wurmlab / sequenceserver

Intuitive graphical web interface for running BLAST bioinformatics tool (i.e. have your own custom NCBI BLAST site!)
https://sequenceserver.com
GNU Affero General Public License v3.0
273 stars 114 forks source link

HTML code in fasta description #380

Open tomas-pluskal opened 6 years ago

tomas-pluskal commented 6 years ago

Hi,

I noticed that the beta versions of 1.1.0 changed the way HTML tags are rendered in FASTA file description fields.

In version 1.0.9, HTML entities were interpreted as HTML code, which allowed us to place things like links (<a href=...) or formatted annotations into the FASTA files that are loaded to sequenceserver. However, in version 1.1.0 all HTML tags are shown as plan text instead (I assume the < and > characters are translated to their corresponding HTML entities &lt; and &gt;).

Is this a desired change? In my view, it actually limits the scope of sequenceserver - I thought being able to add HTML formatting to the FASTA descriptions was quite useful.

yeban commented 6 years ago

To minimize security risks, not interpreting any unknown HTML is the right thing to do by default. Any HTML snippet not already defined in the software's source code is unknown HTML, i.e., any user input. So the new behavior is indeed the preferred one.

Why not use the link generation feature to create custom links instead? http://www.sequenceserver.com/doc/#plugin

tomas-pluskal commented 6 years ago

The link generator is nice, but having some basic formatting capabilities would be nice, too (at least translating \n to <br>). Perhaps supporting something like markdown syntax for the descriptions could be nice, too?

tomas-pluskal commented 5 years ago

Hi, I would like to return to this issue. I think having an option to add simple formatting to the sequence descriptions would be nice and useful. And using a markdown parser like kramdown (https://kramdown.gettalong.org/) this would be very easy to implement. What do you think?

yeban commented 5 years ago

I see the utility of it. Currently, adding custom links requires a bit of Ruby. But with embedded markdown, users can add them to the FASTA files using Perl, Python, bash, etc. Maybe embedded markdown can become the standard for adding custom links, while the link generator remains for automatic linking to public databases based on ID/title pattern. I think this feature should be opt-in (that is, disabled by default).

tomas-pluskal commented 5 years ago

I agree with the opt-in. I think it is useful not only for links, but also for highlighting stuff etc.

I can try to code this and make a pull request.

tomas-pluskal commented 5 years ago

@photocyte any thoughts on this?

photocyte commented 5 years ago

I think it is a good idea. Markdown formatting would support encoding of links and newlines, and the other formatting would be useful too.

yannickwurm commented 3 years ago

I like these ideas as they should make it easier to customize outlinks & add complementary information.

However, it can be considered bad practice to modify a FASTA file just to add metadata, because this makes it difficult to verify its integrity in comparison with reference databases/original downloads.

So I suggest a slightly different approach:

A major reason against this approach is that it doesn't piggy-back off BLAST's indexing. It is unclear to me how much of a burden (on server or on client-side) the additional RAM/time/download overhead of parsing the links files would be.

photocyte commented 3 years ago

Hello, thanks for the feedback. Regarding verifying a FASTA files integrity vs original downloads: internally I've come up with a seqkit based FASTA checksum that pays attention to different levels of the sorted sequence content (e.g. all uppercase, to ignore if softmasking was performed) - in brief it looks at a FASTA file w/ 4 different levels of scrutiny w/ a standard md5sum checksum being the highest level of scrutiny & makes a 4-piece checksum (so, matching of part 1,2,3,4 vs just part 4 matching means different things). In my opinion just the file content checksum breaks too easily w/ minor modifications of the FASTA file (e.g. shortening the FASTA record names). I thought the bioinfo field should have come up with such a FASTA specific checksum but I haven't come across it... If there is interest I could try to polish the documentation & release the checksum publicly.

Regarding this case here: I still conceptually like the idea of coding metadata that could be displayed with sequenceserver, in the FASTA header, because as a general rule I like the idea of metadata being explicitly linked to files (too easy for it to get lost if in a separate file). Actually, pure Markdown doesn't encode newlines to my recollection so may not be suitable vs escaped HTML. But I think @tomas-pluskal came up with a different approach for making metadata links using sequenceserver for https://github.com/transXpress/transXpress , that I am not immediately familiar with how it was done.