phac-nml / neptune

Neptune: Genomic Signature Discovery
https://phac-nml.github.io/neptune/
Apache License 2.0
18 stars 7 forks source link

Reference name error #9

Closed pavlo888 closed 5 years ago

pavlo888 commented 5 years ago

Hi,

I am running with an issue when I run Neptune with 23 genomes. For the record, I am running Neptune from a Conda environment specifically created for it. Is there anything I could do to fix this issue? Thanks in advance.

`Estimating k-mer size ... k = 31

k-mer Counting... Submitted 23 jobs. Traceback (most recent call last): File "/anaconda2/envs/neptune_new/bin/neptune-conda", line 11, in load_entry_point('neptune==1.2.5', 'console_scripts', 'neptune')() File "/anaconda2/envs/neptune_new/lib/python2.7/site-packages/neptune/Neptune.py", line 986, in main parse(parameters) File "/anaconda2/envs/neptune_new/lib/python2.7/site-packages/neptune/Neptune.py", line 765, in parse executeParallel(parameters) File "/anaconda2/envs/neptune_new/lib/python2.7/site-packages/neptune/Neptune.py", line 749, in executeParallel execute(execution) File "/anaconda2/envs/neptune_new/lib/python2.7/site-packages/neptune/Neptune.py", line 655, in execute inclusionKMerLocations, exclusionKMerLocations = countKMers(execution) File "/anaconda2/envs/neptune_new/lib/python2.7/site-packages/neptune/Neptune.py", line 238, in countKMers execution.jobManager.runJobs(jobs) File "/anaconda2/envs/neptune_new/lib/python2.7/site-packages/neptune/JobManagerParallel.py", line 138, in runJobs self.synchronize(jobs) File "/anaconda2/envs/neptune_new/lib/python2.7/site-packages/neptune/JobManagerParallel.py", line 178, in synchronize job.get() # get() over wait() to propagate excetions upwards File "/anaconda2/envs/neptune_new/lib/python2.7/multiprocessing/pool.py", line 572, in get raise self._value UnboundLocalError: local variable 'referenceName' referenced before assignment`

emarinier commented 5 years ago

Hello,

I suspect Neptune is crashing when trying to build references for k-mer counting, because that's the only place in the code that "referenceName" is used:

def buildReferences(referenceFile):

    references = {}

    # build references
    for line in referenceFile:

        # new reference:
        if line[0] == ">":
            tokens = (line[1:]).split()
            referenceName = tokens[0]

            references[referenceName] = ""

        # continue building reference:
        else:
            references[referenceName] += line.strip().upper()

return references

I think the only situation where "referenceName" would be referenced before assignment would be if it's trying to build a reference from a file that doesn't start with a ">" character. Which leads me to believe that you might either have an improperly formatted FASTA file or a file is being included that is not a FASTA file.

When specifying inclusion and exclusion targets:

A list of inclusion [or exclusion] targets in FASTA format. You may list multiple file or directory locations following the parameter. Neptune will automatically include all files within directories. However, Neptune will not recurse into additional directories.

https://phac-nml.github.io/neptune/parameters/

Also, because Neptune doesn't check FASTA file extensions, a mistakenly placed file in one of the directories might cause these sorts of problems. Please let me know if this was the problem.

Thank you.

pavlo888 commented 5 years ago

Hi emerinier,

I have checked the integrity of the Fasta files I am using and they seem alright. I have even run a different package with the same Fasta files and the package runs fine.

I have also tried running Npetune with the test data included in the distribution folder and it works.

I do not understand what is wrong. Would you recommend any package to check for the integrity of the FASTA files? I have just checked the files manually.

Cheers, Pablo

emarinier commented 5 years ago

Hi Pablo,

Can you confirm that the very first character in every FASTA file is a ">" character with no spaces, newlines, or comments before it?

Can you also confirm, if you are specifying directories as input, that every directory contains only FASTA files within them? If there is any file present that is not FASTA format (text files, output files, possibly hidden files), then you may see this error. Neptune does not check file extensions.

Does Neptune run correctly if you specify individual files instead of directories? For example:

neptune --inclusion 1.fasta 2.fasta --exclusion 3.fasta 4.fasta --output output_directory

If none of these suggestions reveal the problem, are you able to share the command and files you are using so I can attempt to debug the problem?

I personally haven't used any tools for verifying the format of FASTA files, so I don't think I can recommend any to you.

pavlo888 commented 5 years ago

Hi Eric,

I have finally made it work! Your suggestion of specifying individual Fasta files instead of directories was helpful! Thank you very much for your help!!!! This tool will be very useful in my current work.

Best regards, Pablo

On Wed, Jul 31, 2019 at 9:40 PM Eric Marinier notifications@github.com wrote:

Hi Pablo,

Can you confirm that the very first character in every FASTA file is a ">" character with no spaces, newlines, or comments before it?

Can you also confirm, if you are specifying directories as input, that every directory contains only FASTA files within them? If there is any file present that is not FASTA format (text files, output files, possibly hidden files), then you may see this error. Neptune does not check file extensions.

Does Neptune run correctly if you specify individual files instead of directories? For example:

neptune --inclusion 1.fasta 2.fasta --exclusion 3.fasta 4.fasta --output output_directory

If none of these suggestions reveal the problem, are you able to share the command and files you are using so I can attempt to debug the problem?

I personally haven't used any tools for verifying the format of FASTA files, so I don't think I can recommend any to you.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/phac-nml/neptune/issues/9?email_source=notifications&email_token=AK4VSEXDXUCXOKYVGVECVJDQCHTCXA5CNFSM4HL2TRQ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3IKWVA#issuecomment-516991828, or mute the thread https://github.com/notifications/unsubscribe-auth/AK4VSER3FKQXIPYC2UXBC3TQCHTCXANCNFSM4HL2TRQQ .

emarinier commented 5 years ago

Great to hear!