wtmatlock / flanker

Gene-flank analysis tool
MIT License
25 stars 6 forks source link

Question about Flanker Docs #57

Open haruosuz opened 1 year ago

haruosuz commented 1 year ago



The "--wstop" and "--wstep" options should be written with a single dash instead of a double dash, as follows:

flanker --flank upstream --window 0 -wstop 5000 -wstep 100 --gene blaKPC-2 --fasta_file david_plasmids.fasta --include_gene

Using double dash instead of single dash with these options results in an error, as shown below:

usage: flanker [-h] -i FASTA_FILE (-g GENE [GENE ...] | -log LIST_OF_GENES)
               [-cm] [-f FLANK] [-m MODE] [-circ] [-inc] [-db DATABASE]
               [-v [VERBOSE]] [-w WINDOW] [-wstop WINDOW_STOP]
               [-wstep WINDOW_STEP] [-cl] [-o OUTFILE] [-tr THRESHOLD]
               [-p THREADS] [-k KMER_LENGTH] [-s SKETCH_SIZE]
flanker: error: unrecognized arguments: --wstop 5000 --wstep 100
wtmatlock commented 1 year ago

Hi Haruo.

haruosuz commented 1 year ago

Thank you for the advice.


I used the --cluster flag with the following command to run Flanker:

flanker --fasta_file ${FASTA_FILE} --gene blaKPC-2 --include_gene --flank upstream --window 0 -wstop 5000 -wstep 100 --cluster


I have attached a zip file that contains the stdout, stderr, and output files generated by running my script with the following command:

qsub -v fasta="$FILE" pbs.flanker.sh

In the stderr file , the following ERROR messages were generated:

Error: Gene blaKPC-2 not found in MT560078.1
ERROR: could not open "/var/tmp/pbs.6848519.bias5-adm/tmp002m72ry/mash.msh" for reading.

The 50 output files <outblaKPC-2*> were empty and contained only the following single line:

wtmatlock commented 1 year ago

It looks like that plasmid contains blaKPC-3, not blaKPC-2: https://www.ncbi.nlm.nih.gov/nuccore/MT560078.1/. I wonder whether this is the case for the others?

Perhaps run this instead:

flanker --fasta_file ${FASTA_FILE} --gene blaKPC --include_gene --flank upstream --window 0 -wstop 5000 -wstep 100 --cluster

which should grab either blaKPC-2 or blaKPC-3.



haruosuz commented 1 year ago

Thank you for your suggestion. I used the --gene blaKPC option instead of -gene blaKPC-2, and provided the fasta file available at https://github.com/wtmatlock/flanker/blob/main/flanker/tests/data/test.fasta for the input fasta file --fasta_file ${FASTA_FILE}.


I have attached a zip file that contains the stdout, stderr, and output files.

Upon reviewing the stderr file , I observed 50 ERROR messages similar to the following:

ERROR: could not open "/var/tmp/pbs.6849120.bias5-adm/tmpgnonmznl/mash.msh" for reading.

I suspect that this may be the reason why all 50 output files <outblaKPC*> did not contain the results.

haruosuz commented 1 year ago

Dear Will,

I hope this message finds you well.

I was wondering if you have any advice or suggestions on how to resolve the ERROR: could not open " issue mentioned above?

I would greatly appreciate any help you can provide.

Thank you for your time and assistance.

Best regards, Haruo Suzuki

wtmatlock commented 1 year ago

Dear Haruo,

My apologies, I am writing up my thesis at the moment so a little stretched thin!

Just to check, is flanker writing the flanking sequences okay? To cluster, flanker looks for files in your current working directory that end with flank.fasta.



haruosuz commented 1 year ago

Thank you for your response, despite your busy schedule.


I have attached a zip file that contains the stdout, stderr, and output files generated by running the following commands:

flanker --fasta_file ${FASTA_FILE} --gene blaKPC --include_gene --flank upstream --window 0 -wstop 5000 -wstep 100
flanker --fasta_file ${FASTA_FILE} --gene blaKPC --include_gene --flank upstream --window 0 -wstop 5000 -wstep 100 --cluster

Based on my understanding, the first command generated files that end with , while the second command generated files that start with .

Upon reviewing the stderr file , I observed 49 ERROR messages that are similar to the following:

ERROR: could not open "/var/tmp/pbs.6865273.bias5-adm/tmpq82lfabj/mash.msh" for reading.

ERROR: could not open "/var/tmp/pbs.6865273.bias5-adm/tmp4jdf3be0/mash.msh" for reading.

Of the 50 files, the file has 2501 lines, while the other 49 files only have a single line of column names (assembly_1,cluster), as follows:

$wc -l out_* | head -n 3
  2501 out_blaKPC_0
     1 out_blaKPC_100
     1 out_blaKPC_1000

$head -n 3 out_blaKPC_0

$tail -n 3 out_blaKPC_0 

$head out_blaKPC_100
wtmatlock commented 1 year ago

Hi Haruo,

I just ran the clustering mode as follows. First, I made test.fasta containing the contigs CABFYD010000003.1 and CABFYI010000002.1. Then, I ran

$flanker --fasta_file * --gene blaKPC --include_gene --flank upstream --window 0 -wstop 5000 -wstep 100 -cl

found 2 files

found 2 files


found 2 files

Checking via your method:

$wc -l out_* | head -n 3
       3 out_blaKPC_0
       3 out_blaKPC_100
       3 out_blaKPC_1000

Indeed, it seems to have worked:

$cat out* | sed '/assembly/d'  > all_out
$cat all_out | head -6

I would suggest first trying to reproduce the above. If that works, you should be good:)



haruosuz commented 1 year ago

Dear Will,

Thank you for your previous response.

flanker --fasta_file ${FASTA_FILE} --gene blaKPC --include_gene --flank upstream --window 0 -wstop 5000 -wstep 100
flanker --fasta_file ${FASTA_FILE} --gene blaKPC --include_gene --flank upstream --window 0 -wstop 5000 -wstep 100 --cluster

I have a question about the need for the first command that I used, which generated files ending with . Do I need to run this command before using the clustering option, or is it unnecessary? I ask because when I ran the clustering mode with and without the first command, it resulted in different outputs.

I would suggest first trying to reproduce the above.

I failed to reproduce your results. Here are the steps I followed:

First, I made containing the contigs CABFYD010000003.1 and CABFYI010000002.1 using:

seqkit grep -nrp "CABFYD010000003.1|CABFYI010000002.1" data/test.fasta > my_test.fasta

Then, I ran the clustering mode as follows.

$flanker --fasta_file my_test.fasta --gene blaKPC --include_gene --flank upstream --window 0 -wstop 5000 -wstep 100 -cl

This generated messages like the following:

found 2,102 files

ERROR: could not open "/tmp/tmp02lduhz7/mash.msh" for reading.

It appears that Flanker is experiencing difficulties accessing the file. Is there a solution to this issue? For instance, is it feasible to designate an alternative directory for Flanker to create the file?

I have attached a zip file that contains the , , and the 50 files generated by running the command.


Dx-wmc commented 4 months ago

I have the same question. In clustering mode, the same error as above will be prompted. Can you provide a docker container to enhance reproducibility?

wtmatlock commented 4 months ago

Hi there,

I have tried again to reproduce this issue - see above your post - and am having no luck. What kind of system are you running it on?



Dx-wmc commented 4 months ago

Hi, I'm using the ubuntu 22 system, on this system I always have problems installing, but when I use the centos system it has no errors, I don't know if it could be the system. But I have a suggestion, to avoid these kinds of errors, it might be better to release a docker image in the future.

wtmatlock commented 4 months ago

Yes, we are working on a Docker install now, actually. Hopefully we will have it on the repo soon!