uio-cels / NucDiff

In-depth characterization and annotation of differences between two sets of DNA sequences
Mozilla Public License 2.0
59 stars 10 forks source link

Run with no output #10

Open davidaray opened 4 years ago

davidaray commented 4 years ago

I'm running nucdiff on a uge cluster, attempting to use 10 processor. The process starts and appears to go well but I never get any output. I'm including my command line and the output.

If you can offer any help, I'd very much appreciate it. I'm trying to demonstrate this package for my class of graduate students.

Thanks.

David

Command: nucdiff --proc 10 --vcf yes ../fly_nanopore.contigs.fasta ../fly_polished.fasta . nucdiff

Output: 1: PREPARING DATA 2,3: RUNNING mummer AND CREATING CLUSTERS

reading input file "/lustre/work/daray/canu/fly_qsub/nucdiff/nucdiff.ntref" of length 136825466

construct suffix tree for sequence of length 136825466

(maximum reference length is 536870908)

(maximum query length is 4294967295)

process 1368254 characters per dot

....................................................................................................

CONSTRUCTIONTIME /home/daray/conda/opt/mummer-3.23/mummer /lustre/work/daray/canu/fly_qsub/nucdiff/nucdiff.ntref 66.30

reading input file "/lustre/work/daray/canu/fly_qsub/fly_polished.fasta" of length 137848239

matching query-file "/lustre/work/daray/canu/fly_qsub/fly_polished.fasta"

against subject-file "/lustre/work/daray/canu/fly_qsub/nucdiff/nucdiff.ntref"

COMPLETETIME /home/daray/conda/opt/mummer-3.23/mummer /lustre/work/daray/canu/fly_qsub/nucdiff/nucdiff.ntref 774.61

SPACE /home/daray/conda/opt/mummer-3.23/mummer /lustre/work/daray/canu/fly_qsub/nucdiff/nucdiff.ntref 265.78

4: FINISHING DATA

Not sure why the varying text sizes. Sorry about that.

kseniakh commented 4 years ago

Hi!

It seems that you've got output only from MUMmer but not NucDiff. I really cannot advice you anything here. Sorry :(

davidaray commented 4 years ago

I'm confused. I invoked NucDiff and it worked well enough to run MUMmer. That suggests it's installed properly.

nucdiff --proc 10 --vcf yes ../fly_nanopore.contigs.fasta ../fly_polished.fasta . nucdiff

But then, when it prepares to move on to doing the comparison, that's what NucDiff does, it stops with no error message.

That seems like a NucDiff problem. Am I mistaken?

kseniakh commented 4 years ago

I misunderstood your first message. Sorry for this. Lets try to find out what's going on.

Can you use already generated output from MUMmer as input for NucDiff by using the --delta_file option? Does it work ?

Also try with 1 cpu. Anyway it won't be much difference in time..

davidaray commented 4 years ago

I just submitted a job with that request.

$ nucdiff --proc 1 --vcf yes --delta_file nucdiff.delta ../fly_nanopore.contigs.fasta ../fly_polished.fasta . nucdiff

How long would it take to see results? Nothing is appearing in any of the output folders or in the log files.

kseniakh commented 4 years ago

Time depends on the dataset you have. You can use "top" command in bash to check if it runs or not.

davidaray commented 4 years ago

It could be working. 186408 daray 20 0 5565856 5.267g 1408 R 100.0 2.8 15:23.52 delta-filter

I must go teach now. Will update later today. Thank you for the help.

davidaray commented 4 years ago

So, I just checked and there has been no change in 24 hours.

That being said, I'm still getting results from 'top'.

(base) compute-7-57:$ top | grep daray 186408 daray 20 0 5565856 5.296g 556 R 100.0 2.8 1465:05 delta-filter

The unfortunate reality, though is that on this queue I have a 48 hour wall time. Thus, if it's not finished by tomorrow at this time, I will need to start over on a different queue.

This run is set to examine the differences between two Drosophila mauritiana assemblies. One was based solely on Nanopore reads, the second was polished with pilon using Illumina data. The assembly itself is very small for a eukaryote, ~124 Mb. Should such a small genome comparison take this long?

kseniakh commented 4 years ago

Since it is delta-filter that is running, it is not a nucdiff problem. However, to exclude nucdiff influence you can run mummer on your own.

The size of the genome is not very big to take so much time, but everything is possible. Another option can be that there are many contigs in assembly(ies) or the genome has many repeated regions. These two facts may influence on the execution time.

Just to exclude any problems with cluster, you can run something small just to be sure that NucDiff is working as expected.