Issues with Workflow, esp. workflow_genome_sinteny_with_proteins doc

seoanezonjic / TarSynFlow

GNU General Public License v3.0

1 stars 0 forks source link

Hi there! Really interesting work and I enjoyed reading the paper. Reaching out because I'm having some difficulties running your workflow. I've made a list to see if you have any solutions?

Just to confirm based off of the manuscript and README, the genomes are gene coding files (nucleotide FASTA format), correct? Could you upload your data you used for the manuscript to help as example files?
The gen_ref and gen_queries files: can you specify what type of file needs to be made in your workflow (.txt, .sh, .doc, etc.) or better yet have an example made as a run-through with the data you used for the paper (as mentioned above)?
the "workflow_genome_sinteny_with_proteins" looks as though a lot of modifications have to be made to run, especially the module load commands that are computer cluster specific. It wasn't mentioned in the README (only mentioned changing the .sh files), but I am assuming these along with the PATHS within this doc need to be changed. That being said, are there other modifications not specified that need to be made to run on other devices?

Thanks!

Hi @gchaput19 I'll answer you point by point: 1) The genome files must be fasta files with the chromosome/scaffold sequences of the organism NOT predicted ORF of coding genes. 2) the gen_* files must contain the exact file name of the files saved in the genome folder, so they are plain text files with no extension. You have to decide how compare them (which genomes are your query and which your referece. I use known strains as reference an the genome of m y project as query). Example: -gen_queries:

Pdp11_1.fasta

-gen_refs

Shewanella_putrefaciens_SH16_micro22.fasta
Shewanella_putrefaciens_SH19_micro13.fasta
Shewanella_putrefaciens_SH2_micro1.fasta
Shewanella_putrefaciens_SH4_micro9.fasta
Shewanella_putrefaciens_SH6_micro12.fasta

3) In fact, you can ignore module commands. I use this platform in a supercomputer with SLURM queue manager. In this environment, the software is accessible through the module command. If you installed and made PATH accessible all the workflow requirements, the platform should work and you would obtain a list of warning about the module command that is not found but the results should be generated. You don't need to modify the scripts to load the software, if it's in the PATH variable my platform would use it. Thank you by your interest in my work!

seoanezonjic / TarSynFlow

Issues with Workflow, esp. workflow_genome_sinteny_with_proteins doc #1