ohlab / SMEG

Strain-level Metagenomic Estimation of Growth rate (SMEG) measures growth rates of microbial strains from complex metagenomic dataset
17 stars 6 forks source link

reference genome issue in trying to build species database #8

Open eroofa opened 3 years ago

eroofa commented 3 years ago

Hello. Apparently, I have the same issue as two others had previously in trying to build the species database: the build stops with error "At least, a complete reference genome is required to reorder contigs of draft genome". There are several complete genomes in the genomes folder. So I then tried to use the -r option to tell SMEG which is the reference, but get the same error. I also removed the plasmid from this reference (as maybe having a chromosome and plasmid was confusing SMEG) but even with just a single chromosome the same error occurs.

The rename utility is installed.

I am using the bioconda install on Ubuntu 18.04 (VM). The estimate functions work, the problem is the database building.

ps the download_genomes.sh script is also broken

aemiol commented 3 years ago

Hi, Sorry about that. I suspect some of the bash syntax are incompatible with Ubuntu. I will have to test the installation and 'example test` on Ubuntu. In the meantime, if you can obtain SMEG via singularity, you shouldn't encounter this problem.

Cheers, Tunde

ingotron commented 3 years ago

Hi,

I'm trying to use SMEG, also on Ubuntu systems (and Ubuntu under WSL). Singularity is bugging out, so for the time being I'm trying to make my conda installation work, and I'm also running into this issue. After some debugging in the build_sp script, it seems that it breaks around the call to rename (build_sp line 38). If you disable the stderr redirection, you will see that rename throws an error something like this:

Bareword "Thiomicrorhabdus_sediminis_str_G1" not allowed while "strict subs" in use at (user-supplied code).

Also, the intermediate file num_of_contigs will contain each genome file in column 1, but no contig count in column 2, e.g.:

Thiomicrorhabdus_sediminis_str_G1.fasta <nothing, should be 1> Thiomicrospira_arctica_genomic.fna <nothing, should be 6>

I suspect that because rename does not work as intended, grep -c ">" fails to count contigs, and the reference genome file is not recognized. Incidentally, I think rename is the reason the download script fails, as well. What is the intended result of using rename here? I have been trying to code an Ubuntu-compatible workaround, but I can't extrapolate from the code how the file names are supposed to look after rename.

Cheers, /Ingo

aemiol commented 3 years ago

Hi, One possible option, though untested, may be to replace rename "$f.noplasmid $f $f.noplasmid" on line 38 to cat $f.noplasmid > $f rm $f.noplasmid

Again, this is untested and hope it resolves the 'rename' issue.

Cheers, Tunde