Closed LemoAlex closed 3 years ago
This seems odd from logfile.
[01/11/21 08:23:44]: 9,557 total contigs; skipping -51,760 contigs with no genes
Do you have the predict logfile that I could look at as well?
Hmm, okay thanks. I can't quite tell, but maybe looks like the command line around the --species
argument perhaps isn't getting passed properly, ie if you look at the log file that is printing the command:
/venv/bin/funannotate predict -i /home/alexandre/funannotate/fish.masked.fa -o ./output1 -s Species name--transcript_evidence /home/alexandre/funannotate/Alignment/Tran.fa --optimize_augustus --other_gff /home/alexandre/funannotate/Tran.fa.transdecoder.gff3 --protein_evidence uniprot-catfish-reviewed.fasta uniprot-zebrafish-reviewed.fasta --organism other --rna_bam /home/alexandre/funannotate/sorted.bam --weights codingquarry:1 --cpus 4
I don't know how that would necessarily be causing problems per say with EVM.... but seems like maybe just a typo? In your initial command above there is clearly a space.
-s Species name--transcript_evidence /home/alexandre/funannotate/Alignment/Tran.fa
So assuming above is not related to error, you can try to run the EVM command from that same directory and maybe that will yield more info to stdout, ie:
funannotate-docker /venv/bin/python /venv/lib/python3.7/site-packages/funannotate/aux_scripts/funannotate-runEVM.py -w /home/alexandre/funannotate/output1/predict_misc/weights.evm.txt -c 4 -g /home/alexandre/funannotate/output1/predict_misc/gene_predictions.gff3 -d /home/alexandre/funannotate/output1/predict_misc/EVM -f /home/alexandre/funannotate/output1/predict_misc/genome.softmasked.fa -l ./output1/logfiles/funannotate-EVM.log -m 10 -o /home/alexandre/funannotate/output1/predict_misc/evm.round1.gff3 --EVM_HOME /venv/opt/evidencemodeler-1.1.1 -p /home/alexandre/funannotate/output1/predict_misc/protein_alignments.gff3 -t /home/alexandre/funannotate/output1/predict_misc/transcript_alignments.gff3
Actually that will probably fail based on what I have in the bash script, you can create a new bash wrapper like this that will just run the image (it is same just doesn't include call to funannotate):
#!/usr/bin/env bash
realpath() {
OURPWD=$PWD
cd "$(dirname "$1")"
LINK=$(readlink "$(basename "$1")")
while [ "$LINK" ]; do
cd "$(dirname "$LINK")"
LINK=$(readlink "$(basename "$1")")
done
REALPATH="$PWD/$(basename "$1")"
cd "$OURPWD"
echo "$REALPATH"
}
timezone() {
if [ "$(uname)" == "Darwin" ]; then
TZ=$(readlink /etc/localtime | sed 's#/var/db/timezone/zoneinfo/##')
else
TZ=$(readlink /etc/timezone)
fi
echo $TZ
}
# Only allocate tty if one is detected. See - https://stackoverflow.com/questions/911168
if [[ -t 0 ]]; then IT+=(-i); fi
if [[ -t 1 ]]; then IT+=(-t); fi
USER="$(id -u $(logname)):$(id -g $(logname))"
WORKDIR="$(realpath .)"
MOUNT="type=bind,source=${WORKDIR},target=${WORKDIR}"
TZ="$(timezone)"
exec docker run --rm "${IT[@]}" --user "${USER}" -e TZ="${TZ}" --workdir "${WORKDIR}" --mount "${MOUNT}" nextgenusfs/funannotate:latest "$@"
Here is a generalized version of this bash script -- you could run with any docker container: https://github.com/nextgenusfs/dw/
Hello again,
Thanks for the answers. I tried by removing the spaces in the species name, but I still get the same error .
I also tried running the EVM step using the bash script through dw, but again I get the exact same output as I did when running the whole pipeline. I also get (I had it before aswell), a single file called : genes.1.bed in the predict_mis/EVM folder. It feels like EVM can't go past the first scaffold, could this be possible?
Thanks, Alexandre
I suppose it could be running out of RAM. Can you increase the RAM allocated to docker?
Nevermind, saw your log file and it is already 264 GB.
When you call this are all of the files you are passing to the docker container located in the same run directory?
Other thing to try would be to just move into the docker image interactively and then try to run the EVM workflow, ie docker run -it -v {need to mount filesystem folders} nextgenusfs/funannotate /bash/bin
And then lastly, I assume the test dataset runs on your system?
funannotate-docker test -t rna-seq --cpus XX
One other thing to try would be to delete all of the EVM temp files and then try to add --no-evm-partitions
to your predict command (I just realized its not in the help menu) -- but this will run the partitioning differently if that is what is causing EVM to die.
But going back to my original thought in the EVM log file, that this line seems strange:
[01/11/21 08:23:44]: 9,557 total contigs; skipping -51,760 contigs with no genes
What is happening in the code is this:
# sort the results by contig and position
ChrGeneCounts = {}
sortedResults = natsorted(Results, key=lambda x: (x[0], x[1]))
with open(bedGenes, 'w') as outfile:
for x in sortedResults:
outfile.write('{}\t{}\t{}\t{}\t{}\t{}\n'.format(x[0], x[1], x[2],
x[3], x[4], x[5]))
if not x[0] in ChrGeneCounts:
ChrGeneCounts[x[0]] = 1
else:
ChrGeneCounts[x[0]] += 1
ChrNoGenes = len(SeqRecords) - len(ChrGeneCounts)
lib.log.debug('{:,} total contigs; skipping {:,} contigs with no genes'.format(len(SeqRecords), ChrNoGenes))
This suggests something is wrong with the input files (something I've not seen before), it it is saying that it somehow found >50k contigs that don't have genes associated with them.
This suggests that something is wrong with the headers on one of these input files -- can you validate that the input files have appropriate FASTA/Sequence headers? For example, the custom GFF that you are passing do they match the genome FASTA headers? And the BAM file as well, do the headers match?
For example, the custom GFF that you are passing do they match the genome FASTA headers?
Ok, maybe the problem is there! My GFF file comes from Transdecoder, but I used the transcriptome as an input. So obviously, the transcriptome and the genome don't have the same headers. Could the problem come from there? What could I use as an alternative then?
Thanks,
Alexandre
So if the transcripts aren't aligned to the genome reference then it shouldn't be passed as GFF_other. If you have transcripts from Transdecoder that you want to align, you can pass those as FASTA format to --transcript_evidence
-- this option takes multiple inputs as space delimited.
Maybe its not obvious -- but the pipeline might work a lot better if you let funannotate train
run Trinity/PASA/transdecoder. That way those tools get run in a way that funannotate
knows the format....
Hi,
Sorry for the long delay. Just to let you know that I ran it as you suggested and I was able to finish the whole pipeline successfully, so thank you!
Best, Alexandre
Hello funannotate users,
I am currently using funanotate v1.8.4, installed through docker, and funannotate check and testing works without issues.
I am trying to run funannotate predict on some fish genome assembly.
So, when I run:
funannotate-docker predict -i ~softmasked.genome.fasta -o ./output1 -s "Species name" --transcript_evidence Transcriptome.fasta --optimize_augustus --other_gff /home/alexandre/funannotate/Species.transdecoder.gff3 --protein_evidence uniprot.reviewed.fasta uniprot-reviewed.fasta --organism other --rna_bam ~/funannotate/alignment.bam --weights codingquarry:1 --cpus 4
Everything runs smoothly until the EvidenceModeler part. Then, I get this message :
funannotate-EVM.log EVM: partitioning input to ~ 35 genes per partition Traceback (most recent call last): File "/venv/lib/python3.7/site-packages/funannotate/aux_scripts/funannotate-runEVM.py", line 433, in
partitions=args.no_partitions)
File "/venv/lib/python3.7/site-packages/funannotate/aux_scripts/funannotate-runEVM.py", line 203, in create_partitions
k, len(SeqRecords[k])))
File "/venv/lib/python3.7/site-packages/Bio/File.py", line 248, in getitem
record = self._proxy.get(self._offsets[key])
KeyError: 'scaffold_1'
[Jan 11 08:24 AM]: Evidence modeler has failed, exiting
Traceback (most recent call last):
File "/venv/bin/funannotate", line 713, in
main()
File "/venv/bin/funannotate", line 703, in main
mod.main(arguments)
File "/venv/lib/python3.7/site-packages/funannotate/predict.py", line 1730, in main
os.remove(EVM_out)
FileNotFoundError: [Errno 2] No such file or directory: '~/output1/predict_misc/evm.round1.gff3'
The EVM logfile (attached) does not show any error, so I am a bit confused with what's going on here.
Thanks for the help, Best, Alexandre funannotate-EVM.log