steineggerlab / ufcg

UFCG: Universal Fungal Core Genes
https://ufcg.steineggerlab.com
GNU General Public License v3.0
32 stars 0 forks source link

[Question] Train module issue #6

Open llk578496 opened 1 year ago

llk578496 commented 1 year ago

Hello @endixk ,

Thank you for developing this amazing pipeline! Our team are currently working on the clinical outbreak investigation on one of the most challenging multidrug-resistant fungi - Candida auris.

We would like to build a specific marker gene set for Candida auris by using the train module base on all the Candida auris genomes available on NCBI with complete/chromosome assembly level.

We have already downloaded a total of 45 genomes and created a directory containing all these reference genomes. However, when we tried to use the train module, we found that there was one more required option: -i STR Directory containing marker sequences in FASTA format (should be able to build an MSA).

May we know what data should we provide for this option?

Thank you very much!

Best regards, Eddie

endixk commented 1 year ago

Dear Eddie,

Thank you for using our pipeline!

Based on your description of your aim, it sounds like you are trying to identify marker genes for Candida auris de novo from your genome sequences. Unfortunately, this is currently beyond the capabilities of our pipeline.

The train module of our pipeline is designed to generate profile HMMs from a pre-defined set of marker genes, using an iterative training process with the given set of genome sequences to improve sensitivity. This means that in order to use the module, you will need to first identify a set of candidate marker genes for Candida auris.

One potential resource for identifying marker genes for this organism could be OrthoDB Saccharomycetes subset. Once you have identified a set of candidate marker genes, you can create a FASTA file for each marker by gathering a handful of protein sequences. Then, you can provide a directory containing all of these FASTA files as the input for the -i option, which will generate profile HMMs for the marker genes you provided.

If this explanation is unclear or if you have any further questions, please do not hesitate to ask.

Thanks!

Best wishes, Daniel