thegenemyers / DALIGNER

Find all significant local alignments between reads
Other
138 stars 61 forks source link

running HPC.daligner script #81

Closed peritob closed 6 years ago

peritob commented 6 years ago

daligner -v -b -k16 -t100 -Dlocalhost:1234 -r1 -j16 DB.5 DB.4 DB.3 DB.2 DB.1

My apologies for not fully understanding your very detailed explanation and thanks in advance for your help.

I am using the HPC.daligner script within the Marvel assembler. One of the script lines is above. I frequently run out of resources and have to restart with perhaps only: DB.5.DB.5.las and DB.5.DB.4.las apparently completed based on the output.txt.

Two questions:

  1. Will daligner restart from the last point and continue on appropriately? (I keep checkpoint files for restarts)

  2. It appears that each step of alignment completes before moving to the next. Does this imply that the script can be altered to the files that have completed? For example, as the DB.5.DB.4.las completed can I change the script to:

daligner -v -b -k16 -t100 -Dlocalhost:53778 -r1 -j16 DB.5 DB.3 DB.2 DB.1

thegenemyers commented 6 years ago

If a block to block comparison has completed successfully, then yes, you do not have to repeat it. That is your answer to 2. is yes, you don't have to do DB.5 versus DB.4 again.

  1. is not true, it does not check point its intermediate progress, the unit of granulatrity is block versus block.

With -t100, I could see memory ocassionally spiking severely depending on the sequence content of the blocks being compared. This is probably why the jobs terminate -- over physical memory limit, go to virtual memory and then takes so long, it gets knocked off the queue without completing all jobs. But I'm guessing. Regardless it is better to use the -M parameter to set the maximum amount of memory available to daligner and not use the -t parameter at all. By setting say -M16, daligner will self adjust "-t" so that exactly 16Gbp is used.
In our experience with blocks of 200-250Mbp, the value of -t used gives more than enough sensitivity.

-- Gene

On 4/22/18, 8:54 AM, peritob wrote:

|daligner -v -b -k16 -t100 -Dlocalhost:1234 -r1 -j16 DB.5 DB.4 DB.3 DB.2 DB.1|

My apologies for not fully understanding your very detailed explanation and thanks in advance for your help.

I am using the HPC.daligner script within the Marvel assembler. One of the script lines is above. I frequently run out of resources and have to restart with perhaps only: DB.5.DB.5.las and DB.5.DB.4.las apparently completed based on the output.txt.

Two questions:

1.

Will daligner restart from the last point and continue on
appropriately? (I keep checkpoint files for restarts)

2.

It appears that each step of alignment completes before moving to
the next. Does this imply that the script can be altered to the
files that have completed? For example, as the DB.5.DB.4.las
completed can I change the script to:

|daligner -v -b -k16 -t100 -Dlocalhost:53778 -r1 -j16 DB.5 DB.3 DB.2 DB.1|

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/thegenemyers/DALIGNER/issues/81, or mute the thread https://github.com/notifications/unsubscribe-auth/AGkkNqJrCwCGiyughVxjA-3CBJEVSKt7ks5trCkPgaJpZM4TewNP.

peritob commented 6 years ago

Thanks for your help.