getting fasta file for humans

Hi, unfortunately it's been perhaps over 10 years since I was involved with the preparation of UTR or other sequence files. If I recall correctly, we used to take the 3' UTRs such as provided by Ensembl, with something like a minimum of 500 bp after the stop codon if the annotated UTR was shorter. Additionally there is the complication that 3' UTRs can have introns, so we would splice those out beforehand. I assume it would not necessarily be an issue if this is not done. I would also expect Sylamer to still work quite well if you just took 1000 bp downstream of the stop codon always (disregarding UTR annotation), although better annotations does lead to better results/signals. Finally, we would mask low complexity sequence using DUST and also remove repetitive sequences using RSAT purge-sequences (http://rsat.ulb.ac.be/rsat/) as described in the Supplementary Materials of the paper (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2635553/). These two last steps are quite important in order to combat things like GC-bias and avoiding spurious spikes due to closely related genes or paralogs clumping together.

I looked at the website you mention (genomique) and I see no reason to think they use a different implementation from the one in this repository - additionally I am not aware of any other implementation. As for differences between versions, these really should be small and negligible and only related to such things as binning differences; the method itself settled into a solid form very quickly and was never changed.

micans / sylamer

getting fasta file for humans #2