Closed BradicM closed 9 months ago
Hi, unfortunately it's been perhaps over 10 years since I was involved with the preparation of UTR or other sequence files. If I recall correctly, we used to take the 3' UTRs such as provided by Ensembl, with something like a minimum of 500 bp after the stop codon if the annotated UTR was shorter. Additionally there is the complication that 3' UTRs can have introns, so we would splice those out beforehand. I assume it would not necessarily be an issue if this is not done. I would also expect Sylamer to still work quite well if you just took 1000 bp downstream of the stop codon always (disregarding UTR annotation), although better annotations does lead to better results/signals. Finally, we would mask low complexity sequence using DUST and also remove repetitive sequences using RSAT purge-sequences (http://rsat.ulb.ac.be/rsat/) as described in the Supplementary Materials of the paper (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2635553/). These two last steps are quite important in order to combat things like GC-bias and avoiding spurious spikes due to closely related genes or paralogs clumping together.
I looked at the website you mention (genomique) and I see no reason to think they use a different implementation from the one in this repository - additionally I am not aware of any other implementation. As for differences between versions, these really should be small and negligible and only related to such things as binning differences; the method itself settled into a solid form very quickly and was never changed.
Hello, I am attempting to run a Sylamer analysis, but I am unsure about the format and location of the fasta input file for human data. Could you please provide some guidance on this matter? Additionally, I would like to know if https://www.genomique.info/sylamer/ version of Sylamer is expected to produce the same results as the previous version. Thank you for your help."