uio-cels / NucDiff

In-depth characterization and annotation of differences between two sets of DNA sequences
Mozilla Public License 2.0
59 stars 10 forks source link

NuCdiff block after get SNPs #23

Open El-Castor opened 3 years ago

El-Castor commented 3 years ago

Hi,

I was trying your tool from MUMer outp (delta file) to get all SNPs and gap between two genome.

here you have the command line that I used

nucdiff $inputREFfasta $inputQueryFasta $outputmtxDir $outputFileprefix3 --delta_file $deltafilePath --proc 6

The tools block after saying this message and until tw day is still at the same step. The file .snp is not empty but the filtered.snp soft yes.

here the console message :

(NuCdiff) cpichot@node10:/NetScratch/cpichot/genome_publi_ultimate_analysis/polymorphism_between_CmeloPublishedGenome/out$ nucdiff $inputREFfasta $inputQueryFasta $outputmtxDir $outputFileprefix3 --delta_file $deltafilePath --proc 6

Run NUCmer...

Find differences...

The difference detection inside fragments step is complete

Do you have any suggestion ?

Thanks in advance!

kseniakh commented 3 years ago

Hi!

Definitely something goes wrong. What is the version of nucdiff and if you use it through conda or not?

El-Castor commented 3 years ago

Hi Kseniakh,

thank you for your quick respons

here you have the installed dependancy using conda. Moreover I install NucDiff and all the dependancy using conda as descrive in your documentation.


# packages in environment at /opt/share/FLOCAD/userspace/cpichot/miniconda3/envs/NuCdiff:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       1_gnu    conda-forge
biopython                 1.78             py39h3811e60_1    conda-forge
ca-certificates           2020.12.5            ha878542_0    conda-forge
certifi                   2020.12.5        py39hf3d152e_1    conda-forge
ld_impl_linux-64          2.35.1               hea4e1c9_1    conda-forge
libblas                   3.9.0                7_openblas    conda-forge
libcblas                  3.9.0                7_openblas    conda-forge
libffi                    3.3                  h58526e2_2    conda-forge
libgcc-ng                 9.3.0               h5dbcf3e_17    conda-forge
libgfortran-ng            9.3.0               he4bcb1c_17    conda-forge
libgfortran5              9.3.0               he4bcb1c_17    conda-forge
libgomp                   9.3.0               h5dbcf3e_17    conda-forge
liblapack                 3.9.0                7_openblas    conda-forge
libopenblas               0.3.12          pthreads_h4812303_1    conda-forge
libstdcxx-ng              9.3.0               h2ae2ef3_17    conda-forge
mummer                    3.23                          4    bioconda
ncurses                   6.2                  h58526e2_4    conda-forge
nucdiff                   2.0.3              pyh864c0ab_1    bioconda
numpy                     1.19.5           py39hdbf815f_1    conda-forge
openssl                   1.1.1i               h7f98852_0    conda-forge
perl                      5.32.0               h36c2ea0_0    conda-forge
perl-threaded             5.26.0                        0    bioconda
pip                       20.3.3             pyhd8ed1ab_0    conda-forge
python                    3.9.1           hffdb5ce_3_cpython    conda-forge
python_abi                3.9                      1_cp39    conda-forge
readline                  8.0                  he28a2e2_2    conda-forge
setuptools                49.6.0           py39hf3d152e_3    conda-forge
sqlite                    3.34.0               h74cdb3f_0    conda-forge
tk                        8.6.10               h21135ba_1    conda-forge
tzdata                    2020f                he74cb21_0    conda-forge
wheel                     0.36.2             pyhd3deb0d_0    conda-forge
xz                        5.2.5                h516909a_1    conda-forge
zlib                      1.2.11            h516909a_1010    conda-forge
kseniakh commented 3 years ago

Hi!

Can you list the files that were generated and which of them are empty?

When everything hangs up, does the "top" command report that python command is running or nothing is running at this moment?

What is the size of compared genomes and how many sequences in each ?

Ksenia

El-Castor commented 3 years ago

Hi !

here you have a print of the produced file :

-rw-r--r-- 1 cpichot utilisateurs  18869278 janv. 15 11:52 comp_DHL92_vs_CMiso_NuCDiff_1.coord
-rw-r--r-- 1 cpichot utilisateurs 385070870 janv. 15 14:37 comp_DHL92_vs_CMiso_NuCDiff_1.snps
-rw-r--r-- 1 cpichot utilisateurs    153201 janv. 15 16:28 comp_DHL92_vs_CMiso_NuCDiff_2.coord
-rw-r--r-- 1 cpichot utilisateurs   6366388 janv. 15 16:30 comp_DHL92_vs_CMiso_NuCDiff_2.snps
-rw-r--r-- 1 cpichot utilisateurs  21259590 janv. 15 11:51 comp_DHL92_vs_CMiso_NuCDiff.coords
-rw-r--r-- 1 cpichot utilisateurs  49649790 janv. 15 11:45 comp_DHL92_vs_CMiso_NuCDiff.delta
-rw-r--r-- 1 cpichot utilisateurs  15699135 janv. 15 11:51 comp_DHL92_vs_CMiso_NuCDiff.filter
-rw-r--r-- 1 cpichot utilisateurs         0 janv. 15 11:51 comp_DHL92_vs_CMiso_NuCDiff_filtered.snps

As you can see the file _filetered.snps are empty.

I don't under stand what you mean when you ask "does the "top" command report that python command is running or nothing is running at this moment?", but I don't see any error message when I launch NuCdiff.

The size of the two genomes are more or less at 450 Mb for both, Do you think that I have allocated no sufficient ram ?

kseniakh commented 3 years ago

Hi!

It is OK that the file _filtered.snps is empty.

As I can see, the tool has found the variances inside the fragments, but not between. To exclude the most obvious reasons, I need to know the following information:

  1. how many lines do you have in comp_DHL92_vs_CMiso_NuCDiff.coords ?
  2. how many sequences do you have in both genomes?
  3. when you run the tool and the message "The difference detection inside fragments step is complete" is already output, in the command line type the "top". You will see the list of running process. I am wondering if you see whether the process corresponding to nucdiff (I would expect to see python) is running or not.

It would be great if you also could run the tool with some small genomes (or just two almost similar sequences) to check if you get the same problem.

El-Castor commented 3 years ago

Hi Ksenia,

  1. The file comp_DHL92_vs_CMisoNuCDiff.coords has 140550 lines. then I put the number of lines for the files 1 or 2_ .coord
(base) cpichot@node15:/NetScratch/cpichot/genome_publi_ultimate_analysis/polymorphism_between_CmeloPublishedGenome/out$ wc -l comp_DHL92_vs_CMiso_NuCDiff_1.coord
124858 comp_DHL92_vs_CMiso_NuCDiff_1.coord
(base) cpichot@node15:/NetScratch/cpichot/genome_publi_ultimate_analysis/polymorphism_between_CmeloPublishedGenome/out$ wc -l comp_DHL92_vs_CMiso_NuCDiff_2.coord
1021 comp_DHL92_vs_CMiso_NuCDiff_2.coord
  1. In the first genome I have 13 sequences in the fasta files. In the other genome I have 13 sequences ( 13 chromosomes)
  2. I have try to run NucDiff from the beginning and not from the delta file produce with MUMer and with just the chromosome 1 and its work. I have the results folder fill but the files _filtered.snps Is still empty. Why do you think about that ?

here a print of the results folder :

  1433620 janv. 25 20:11 comp_DHL92_vs_CMiso_NuCDiff_test_1.coord
  45864485 janv. 25 20:11 comp_DHL92_vs_CMiso_NuCDiff_test_1.snps
    39904 janv. 25 20:29 comp_DHL92_vs_CMiso_NuCDiff_test_2.coord
  1205715 janv. 25 20:29 comp_DHL92_vs_CMiso_NuCDiff_test_2.snps
  1533380 janv. 25 20:11 comp_DHL92_vs_CMiso_NuCDiff_test.coords
  34719606 janv. 25 18:40 comp_DHL92_vs_CMiso_NuCDiff_test.delta
   957730 janv. 25 20:11 comp_DHL92_vs_CMiso_NuCDiff_test.filter
    0 janv. 25 20:11 comp_DHL92_vs_CMiso_NuCDiff_test_filtered.snps

and the results folder :

-rw-r--r-- 1 cpichot utilisateurs   165114 janv. 25 23:15 comp_DHL92_vs_CMiso_NuCDiff_test_query_additional.gff
-rw-r--r-- 1 cpichot utilisateurs   252114 janv. 25 23:15 comp_DHL92_vs_CMiso_NuCDiff_test_query_blocks.gff
-rw-r--r-- 1 cpichot utilisateurs 21371054 janv. 25 23:15 comp_DHL92_vs_CMiso_NuCDiff_test_query_snps.gff
-rw-r--r-- 1 cpichot utilisateurs  1093183 janv. 25 23:15 comp_DHL92_vs_CMiso_NuCDiff_test_query_struct.gff
-rw-r--r-- 1 cpichot utilisateurs   227771 janv. 25 23:15 comp_DHL92_vs_CMiso_NuCDiff_test_ref_additional.gff
-rw-r--r-- 1 cpichot utilisateurs   151191 janv. 25 23:15 comp_DHL92_vs_CMiso_NuCDiff_test_ref_blocks.gff
-rw-r--r-- 1 cpichot utilisateurs 21721331 janv. 25 23:15 comp_DHL92_vs_CMiso_NuCDiff_test_ref_snps.gff
-rw-r--r-- 1 cpichot utilisateurs  1145023 janv. 25 23:15 comp_DHL92_vs_CMiso_NuCDiff_test_ref_struct.gff
-rw-r--r-- 1 cpichot utilisateurs      800 janv. 25 23:15 comp_DHL92_vs_CMiso_NuCDiff_test_stat.out
kseniakh commented 3 years ago

Hi ,

I know that NucDiff may require time if there are a lot of structural differences between two specific sequences. Probably, this is the case here. I would rather run the tool one more time and wait a bit longer. The alternative is to run the tool for each chromosome separately. It is not optimal, but at least it will speed up the process, narrow down the problem, if it exists, and show what chromosome cause the problem.