wtmatlock / flanker

Gene-flank analysis tool
MIT License
25 stars 6 forks source link

Question about Flanker Docs #57

Open haruosuz opened 1 year ago

haruosuz commented 1 year ago

Hello,

https://flanker.readthedocs.io/en/latest/

The "--wstop" and "--wstep" options should be written with a single dash instead of a double dash, as follows:

flanker --flank upstream --window 0 -wstop 5000 -wstep 100 --gene blaKPC-2 --fasta_file david_plasmids.fasta --include_gene

Using double dash instead of single dash with these options results in an error, as shown below:

usage: flanker [-h] -i FASTA_FILE (-g GENE [GENE ...] | -log LIST_OF_GENES)
               [-cm] [-f FLANK] [-m MODE] [-circ] [-inc] [-db DATABASE]
               [-v [VERBOSE]] [-w WINDOW] [-wstop WINDOW_STOP]
               [-wstep WINDOW_STEP] [-cl] [-o OUTFILE] [-tr THRESHOLD]
               [-p THREADS] [-k KMER_LENGTH] [-s SKETCH_SIZE]
flanker: error: unrecognized arguments: --wstop 5000 --wstep 100
wtmatlock commented 1 year ago

Hi Haruo.

haruosuz commented 1 year ago

Thank you for the advice.

https://flanker.readthedocs.io/en/latest/#clustering

I used the --cluster flag with the following command to run Flanker:

flanker --fasta_file ${FASTA_FILE} --gene blaKPC-2 --include_gene --flank upstream --window 0 -wstop 5000 -wstep 100 --cluster

6848519.zip

I have attached a zip file that contains the stdout, stderr, and output files generated by running my script with the following command:

FILE=./data/test.fasta
qsub -v fasta="$FILE" pbs.flanker.sh

In the stderr file , the following ERROR messages were generated:

Error: Gene blaKPC-2 not found in MT560078.1
ERROR: could not open "/var/tmp/pbs.6848519.bias5-adm/tmp002m72ry/mash.msh" for reading.

The 50 output files <outblaKPC-2*> were empty and contained only the following single line:

assembly_1,cluster
wtmatlock commented 1 year ago

It looks like that plasmid contains blaKPC-3, not blaKPC-2: https://www.ncbi.nlm.nih.gov/nuccore/MT560078.1/. I wonder whether this is the case for the others?

Perhaps run this instead:

flanker --fasta_file ${FASTA_FILE} --gene blaKPC --include_gene --flank upstream --window 0 -wstop 5000 -wstep 100 --cluster

which should grab either blaKPC-2 or blaKPC-3.

Thanks,

Will

haruosuz commented 1 year ago

Thank you for your suggestion. I used the --gene blaKPC option instead of -gene blaKPC-2, and provided the fasta file available at https://github.com/wtmatlock/flanker/blob/main/flanker/tests/data/test.fasta for the input fasta file --fasta_file ${FASTA_FILE}.

6849120.zip

I have attached a zip file that contains the stdout, stderr, and output files.

Upon reviewing the stderr file , I observed 50 ERROR messages similar to the following:

ERROR: could not open "/var/tmp/pbs.6849120.bias5-adm/tmpgnonmznl/mash.msh" for reading.

I suspect that this may be the reason why all 50 output files <outblaKPC*> did not contain the results.

haruosuz commented 1 year ago

Dear Will,

I hope this message finds you well.

I was wondering if you have any advice or suggestions on how to resolve the ERROR: could not open " issue mentioned above?

I would greatly appreciate any help you can provide.

Thank you for your time and assistance.

Best regards, Haruo Suzuki

wtmatlock commented 1 year ago

Dear Haruo,

My apologies, I am writing up my thesis at the moment so a little stretched thin!

Just to check, is flanker writing the flanking sequences okay? To cluster, flanker looks for files in your current working directory that end with flank.fasta.

Thanks,

Will

haruosuz commented 1 year ago

Thank you for your response, despite your busy schedule.

6865273.zip

I have attached a zip file that contains the stdout, stderr, and output files generated by running the following commands:

flanker --fasta_file ${FASTA_FILE} --gene blaKPC --include_gene --flank upstream --window 0 -wstop 5000 -wstep 100
flanker --fasta_file ${FASTA_FILE} --gene blaKPC --include_gene --flank upstream --window 0 -wstop 5000 -wstep 100 --cluster

Based on my understanding, the first command generated files that end with , while the second command generated files that start with .

Upon reviewing the stderr file , I observed 49 ERROR messages that are similar to the following:

ERROR: could not open "/var/tmp/pbs.6865273.bias5-adm/tmpq82lfabj/mash.msh" for reading.

ERROR: could not open "/var/tmp/pbs.6865273.bias5-adm/tmp4jdf3be0/mash.msh" for reading.

Of the 50 files, the file has 2501 lines, while the other 49 files only have a single line of column names (assembly_1,cluster), as follows:

$wc -l out_* | head -n 3
  2501 out_blaKPC_0
     1 out_blaKPC_100
     1 out_blaKPC_1000

$head -n 3 out_blaKPC_0
assembly_1,cluster
ENA_CABFYD010000003_CABFYD010000003.1_blaKPC-3_3700_upstream_flank.fasta,0
ENA_CABFYI010000002_CABFYI010000002.1_blaKPC-3_3500_upstream_flank.fasta,0

$tail -n 3 out_blaKPC_0 
MT560066.1_blaKPC-2_1100_upstream_flank.fasta,62
ENA_CABGBR010000009_CABGBR010000009.1_blaKPC-2_1000_upstream_flank.fasta,63
ENA_CABFYR010000005_CABFYR010000005.1_blaKPC-2_1000_upstream_flank.fasta,63

$head out_blaKPC_100
assembly_1,cluster
wtmatlock commented 1 year ago

Hi Haruo,

I just ran the clustering mode as follows. First, I made test.fasta containing the contigs CABFYD010000003.1 and CABFYI010000002.1. Then, I ran

$flanker --fasta_file * --gene blaKPC --include_gene --flank upstream --window 0 -wstop 5000 -wstep 100 -cl

found 2 files

found 2 files

...

found 2 files

Checking via your method:

$wc -l out_* | head -n 3
       3 out_blaKPC_0
       3 out_blaKPC_100
       3 out_blaKPC_1000

Indeed, it seems to have worked:

$cat out* | sed '/assembly/d'  > all_out
$cat all_out | head -6
ENA|CABFYI010000002|CABFYI010000002.1_blaKPC-3_0_upstream_flank.fasta,0
ENA|CABFYD010000003|CABFYD010000003.1_blaKPC-3_0_upstream_flank.fasta,0
ENA|CABFYD010000003|CABFYD010000003.1_blaKPC-3_100_upstream_flank.fasta,0
ENA|CABFYI010000002|CABFYI010000002.1_blaKPC-3_100_upstream_flank.fasta,0
ENA|CABFYI010000002|CABFYI010000002.1_blaKPC-3_1000_upstream_flank.fasta,0
ENA|CABFYD010000003|CABFYD010000003.1_blaKPC-3_1000_upstream_flank.fasta,0

I would suggest first trying to reproduce the above. If that works, you should be good:)

Thanks,

Will

haruosuz commented 1 year ago

Dear Will,

Thank you for your previous response.

flanker --fasta_file ${FASTA_FILE} --gene blaKPC --include_gene --flank upstream --window 0 -wstop 5000 -wstep 100
flanker --fasta_file ${FASTA_FILE} --gene blaKPC --include_gene --flank upstream --window 0 -wstop 5000 -wstep 100 --cluster

I have a question about the need for the first command that I used, which generated files ending with . Do I need to run this command before using the clustering option, or is it unnecessary? I ask because when I ran the clustering mode with and without the first command, it resulted in different outputs.

I would suggest first trying to reproduce the above.

I failed to reproduce your results. Here are the steps I followed:

First, I made containing the contigs CABFYD010000003.1 and CABFYI010000002.1 using:

seqkit grep -nrp "CABFYD010000003.1|CABFYI010000002.1" data/test.fasta > my_test.fasta

Then, I ran the clustering mode as follows.

$flanker --fasta_file my_test.fasta --gene blaKPC --include_gene --flank upstream --window 0 -wstop 5000 -wstep 100 -cl

This generated messages like the following:

found 2,102 files

ERROR: could not open "/tmp/tmp02lduhz7/mash.msh" for reading.

It appears that Flanker is experiencing difficulties accessing the file. Is there a solution to this issue? For instance, is it feasible to designate an alternative directory for Flanker to create the file?

I have attached a zip file that contains the , , and the 50 files generated by running the command.

2023-04-28-0756.zip

Dx-wmc commented 4 months ago

I have the same question. In clustering mode, the same error as above will be prompted. Can you provide a docker container to enhance reproducibility?

wtmatlock commented 4 months ago

Hi there,

I have tried again to reproduce this issue - see above your post - and am having no luck. What kind of system are you running it on?

Thanks,

Will

Dx-wmc commented 4 months ago

Hi, I'm using the ubuntu 22 system, on this system I always have problems installing, but when I use the centos system it has no errors, I don't know if it could be the system. But I have a suggestion, to avoid these kinds of errors, it might be better to release a docker image in the future.

wtmatlock commented 4 months ago

Yes, we are working on a Docker install now, actually. Hopefully we will have it on the repo soon!