metaDMG-dev / metaDMG-cpp

metaDMG-cpp
2 stars 2 forks source link

Difference in names on Linux (server) and Mac (local) #3

Closed ChristianMichelsen closed 8 months ago

ChristianMichelsen commented 2 years ago

When I run the same command, the name of the generated files differ between Mac and Linux (i.e. on my local laptop vs the server).

In particular, I run:

./metaDMG-cpp lca -bam raw_data/alignment.bam -outnames data/lca/tmp/alignment/alignment -names raw_data/names-mdmg.dmp -nodes raw_data/nodes-mdmg.dmp -acc2tax raw_data/acc2taxid.map.gz -fix_ncbi 0 -tempfolder data/lca/tmp/alignment/

which either generates alignment.bamalignment.bam.bin on my Mac or acc2taxid.map.gzalignment.bam.bin on the server.

The output of my Mac is:

    -> Will output lca results in file:     'data/lca/tmp/alignment/alignment.lca.gz'
    -> [thread1] Will read header
    -> Will output lca distribution in file:        'data/lca/tmp/alignment/alignment.stat'
    -> Will output lca weight in file:      'data/lca/tmp/alignment/alignment.wlca'
    -> Will output log info (problems) in file: 'data/lca/tmp/alignment/alignment.log'
    -> [thread1] Done reading header: 0.00 sec, header contains: 2
    -> -bam     raw_data/alignment.bam
    -> -names   raw_data/names-mdmg.dmp
    -> -nodes   raw_data/nodes-mdmg.dmp
    -> -acc2tax raw_data/acc2taxid.map.gz
    -> -simscoreLow 0.000000
    -> -simscoreHigh    1.000000
    -> -editdistMin 0
    -> -editdistMax 10
    -> -outnames    data/lca/tmp/alignment/alignment
    -> -minmapq 0
    -> -lca_rank    species
    -> -norank2species  0
    -> -howmany 5
    -> -fix_ncbi    0
    -> -weighttype  0
    -> -tempfolder  38245184
    -> Starting to extract (acc->taxid) from binary file: 'raw_data/acc2taxid.map.gz'
    -> Checking if exits: 'data/lca/tmp/alignment/alignment.bamalignment.bam.bin'
    -> Checking if bimnary file exists. dodump=1
    -> opening file: 'data/lca/tmp/alignment/alignment.bamalignment.bam.bin' mode: 'wb'
    -> Setting threads to: 4
    -> opening file: 'raw_data/acc2taxid.map.gz' mode: 'rb'
    -> Setting threads to: 2
    -> At linenr: 200001 in 'raw_data/acc2taxid.map.gz'         -> Number of entries to use from accesion to taxid: 2, time taken: 0.00 sec
    -> [raw_data/names-mdmg.dmp] Number of unique names (column1): 65949 with third column 'scientific name'
    -> Number of unique names (column1): 65949 from file: raw_data/nodes-mdmg.dmp parent.size():65949 child.size():0
    -> Number of entries with level information: 46
[hts]   -> editMin:0 editmMax:10 scoreLow:0.000000 scoreHigh:1.000000 minlength:-1 discard: 516 prefix: data/lca/tmp/alignment/alignment howmany: 5 skipnorank: 1 weighttype: 0
    -> Will dump: 'data/lca/tmp/alignment/alignment.bdamage.gz' this contains damage patterns for: 2 items
    -> Setting threads to: 4
    -> Number of species with reads that map uniquely: 2
    -> [ALL done] walltime used =  0.00 sec

and the output on Linux is:

    -> Will output lca results in file:     'data/lca/tmp/alignment/alignment.lca.gz'
    -> [thread1] Will read header
    -> [thread1] Done reading header: 0.00 sec, header contains: 2
    -> Will output lca distribution in file:        'data/lca/tmp/alignment/alignment.stat'
    -> Will output lca weight in file:      'data/lca/tmp/alignment/alignment.wlca'
    -> Will output log info (problems) in file: 'data/lca/tmp/alignment/alignment.log'
    -> -bam     raw_data/alignment.bam
    -> -names   raw_data/names-mdmg.dmp
    -> -nodes   raw_data/nodes-mdmg.dmp
    -> -acc2tax raw_data/acc2taxid.map.gz
    -> -simscoreLow 0.000000
    -> -simscoreHigh    1.000000
    -> -editdistMin 0
    -> -editdistMax 10
    -> -outnames    data/lca/tmp/alignment/alignment
    -> -minmapq 0
    -> -lca_rank    species
    -> -norank2species  0
    -> -howmany 5
    -> -fix_ncbi    0
    -> -weighttype  0
    -> -tempfolder  28299616
    -> Starting to extract (acc->taxid) from binary file: 'raw_data/acc2taxid.map.gz'
    -> Checking if exits: 'data/lca/tmp/alignment/acc2taxid.map.gzalignment.bam.bin'
    -> Checking if bimnary file exists. dodump=1
    -> opening file: 'data/lca/tmp/alignment/acc2taxid.map.gzalignment.bam.bin' mode: 'wb'
    -> Setting threads to: 4
    -> opening file: 'raw_data/acc2taxid.map.gz' mode: 'rb'
    -> Setting threads to: 2
    -> At linenr: 200001 in 'raw_data/acc2taxid.map.gz'         -> Number of entries to use from accesion to taxid: 2, time taken: 0.00 sec
    -> [raw_data/names-mdmg.dmp] Number of unique names (column1): 65949 with third column 'scientific name'
    -> Number of unique names (column1): 65949 from file: raw_data/nodes-mdmg.dmp parent.size():65949 child.size():0
    -> Number of entries with level information: 46
[hts]   -> editMin:0 editmMax:10 scoreLow:0.000000 scoreHigh:1.000000 minlength:-1 discard: 516 prefix: data/lca/tmp/alignment/alignment howmany: 5 skipnorank: 1 weighttype: 0
    -> Will dump: 'data/lca/tmp/alignment/alignment.bdamage.gz' this contains damage patterns for: 2 items
    -> Setting threads to: 4
    -> Number of species with reads that map uniquely: 2
    -> [ALL done] walltime used =  1.00 sec
ChristianMichelsen commented 2 years ago

Not a major issue since the .bin is not really used by my end of the pipeline, however, I was just curious about it.

ANGSD commented 8 months ago

This should not be an issue in new versions.