steineggerlab / ufcg

UFCG: Universal Fungal Core Genes
https://ufcg.steineggerlab.com
GNU General Public License v3.0
31 stars 0 forks source link

NUC only uses ITS #29

Open JWDebler opened 3 months ago

JWDebler commented 3 months ago

Hi,

I finally managed to get ufcg running with --set NUC, only to realise that it then only uses ITS.

Why isn't it using all the core genes in NUC mode as it does in PRO mode? Can that be added?

I am working with lots of isolates of the same species and was hoping for better resolution using the nucleotide sequences instead of the protein sequences.

Cheers.

endixk commented 3 months ago

Hello,

If you want to use nucleotide sequences of the core genes, you can run tree module with -a nucleotide option given.

Your .ucg profiles generated with PRO mode should already contain the nucleotide sequences for this, unless you extracted your genes from a proteome input.

Hope this helps!

JWDebler commented 3 months ago

Thanks, I didn't realise that was possible.

Cheers.

Another question, I have now used the gene sequences from the PRO run. These are all isolates of the same species, so I expect everything to be very similar. However I have a couple of alignments that look like this:

image

image

Is there a way to automatically trim them all to the same length?

endixk commented 3 months ago

That can be done by providing a strict gap filtering threshold to the tree module, such as ufcg tree ... -f 10. This will remove the alignment columns with more than 10% of species represented as a gap, which will result in a trimmed alignment as you requested.

You can even use -f 0 and keep the columns only if they are fully aligned, which will guarantee an equal length alignment. This won't be detrimental since you are using isolates from the same species.

JWDebler commented 3 months ago

Excellent, thanks.