Closed pavlo888 closed 5 years ago
Hello,
I suspect Neptune is crashing when trying to build references for k-mer counting, because that's the only place in the code that "referenceName" is used:
def buildReferences(referenceFile):
references = {}
# build references
for line in referenceFile:
# new reference:
if line[0] == ">":
tokens = (line[1:]).split()
referenceName = tokens[0]
references[referenceName] = ""
# continue building reference:
else:
references[referenceName] += line.strip().upper()
return references
I think the only situation where "referenceName" would be referenced before assignment would be if it's trying to build a reference from a file that doesn't start with a ">" character. Which leads me to believe that you might either have an improperly formatted FASTA file or a file is being included that is not a FASTA file.
When specifying inclusion and exclusion targets:
A list of inclusion [or exclusion] targets in FASTA format. You may list multiple file or directory locations following the parameter. Neptune will automatically include all files within directories. However, Neptune will not recurse into additional directories.
https://phac-nml.github.io/neptune/parameters/
Also, because Neptune doesn't check FASTA file extensions, a mistakenly placed file in one of the directories might cause these sorts of problems. Please let me know if this was the problem.
Thank you.
Hi emerinier,
I have checked the integrity of the Fasta files I am using and they seem alright. I have even run a different package with the same Fasta files and the package runs fine.
I have also tried running Npetune with the test data included in the distribution folder and it works.
I do not understand what is wrong. Would you recommend any package to check for the integrity of the FASTA files? I have just checked the files manually.
Cheers, Pablo
Hi Pablo,
Can you confirm that the very first character in every FASTA file is a ">" character with no spaces, newlines, or comments before it?
Can you also confirm, if you are specifying directories as input, that every directory contains only FASTA files within them? If there is any file present that is not FASTA format (text files, output files, possibly hidden files), then you may see this error. Neptune does not check file extensions.
Does Neptune run correctly if you specify individual files instead of directories? For example:
neptune --inclusion 1.fasta 2.fasta --exclusion 3.fasta 4.fasta --output output_directory
If none of these suggestions reveal the problem, are you able to share the command and files you are using so I can attempt to debug the problem?
I personally haven't used any tools for verifying the format of FASTA files, so I don't think I can recommend any to you.
Hi Eric,
I have finally made it work! Your suggestion of specifying individual Fasta files instead of directories was helpful! Thank you very much for your help!!!! This tool will be very useful in my current work.
Best regards, Pablo
On Wed, Jul 31, 2019 at 9:40 PM Eric Marinier notifications@github.com wrote:
Hi Pablo,
Can you confirm that the very first character in every FASTA file is a ">" character with no spaces, newlines, or comments before it?
Can you also confirm, if you are specifying directories as input, that every directory contains only FASTA files within them? If there is any file present that is not FASTA format (text files, output files, possibly hidden files), then you may see this error. Neptune does not check file extensions.
Does Neptune run correctly if you specify individual files instead of directories? For example:
neptune --inclusion 1.fasta 2.fasta --exclusion 3.fasta 4.fasta --output output_directory
If none of these suggestions reveal the problem, are you able to share the command and files you are using so I can attempt to debug the problem?
I personally haven't used any tools for verifying the format of FASTA files, so I don't think I can recommend any to you.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/phac-nml/neptune/issues/9?email_source=notifications&email_token=AK4VSEXDXUCXOKYVGVECVJDQCHTCXA5CNFSM4HL2TRQ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3IKWVA#issuecomment-516991828, or mute the thread https://github.com/notifications/unsubscribe-auth/AK4VSER3FKQXIPYC2UXBC3TQCHTCXANCNFSM4HL2TRQQ .
Great to hear!
Hi,
I am running with an issue when I run Neptune with 23 genomes. For the record, I am running Neptune from a Conda environment specifically created for it. Is there anything I could do to fix this issue? Thanks in advance.
`Estimating k-mer size ... k = 31
k-mer Counting... Submitted 23 jobs. Traceback (most recent call last): File "/anaconda2/envs/neptune_new/bin/neptune-conda", line 11, in
load_entry_point('neptune==1.2.5', 'console_scripts', 'neptune')()
File "/anaconda2/envs/neptune_new/lib/python2.7/site-packages/neptune/Neptune.py", line 986, in main
parse(parameters)
File "/anaconda2/envs/neptune_new/lib/python2.7/site-packages/neptune/Neptune.py", line 765, in parse
executeParallel(parameters)
File "/anaconda2/envs/neptune_new/lib/python2.7/site-packages/neptune/Neptune.py", line 749, in executeParallel
execute(execution)
File "/anaconda2/envs/neptune_new/lib/python2.7/site-packages/neptune/Neptune.py", line 655, in execute
inclusionKMerLocations, exclusionKMerLocations = countKMers(execution)
File "/anaconda2/envs/neptune_new/lib/python2.7/site-packages/neptune/Neptune.py", line 238, in countKMers
execution.jobManager.runJobs(jobs)
File "/anaconda2/envs/neptune_new/lib/python2.7/site-packages/neptune/JobManagerParallel.py", line 138, in runJobs
self.synchronize(jobs)
File "/anaconda2/envs/neptune_new/lib/python2.7/site-packages/neptune/JobManagerParallel.py", line 178, in synchronize
job.get() # get() over wait() to propagate excetions upwards
File "/anaconda2/envs/neptune_new/lib/python2.7/multiprocessing/pool.py", line 572, in get
raise self._value
UnboundLocalError: local variable 'referenceName' referenced before assignment`