kamimrcht / ELECTOR

ELECTOR: EvaLuator of Error Correction Tools for lOng Reads
GNU Affero General Public License v3.0
15 stars 3 forks source link

swag*': No such file or directory #2

Open jnarayan81 opened 6 years ago

jnarayan81 commented 6 years ago

Some swag* files, seems missing !

➜  ELECTOR git:(master) ✗ python3 elector.py -perfect /media/urbe/ARCgenomic/toyGenome/toy.fasta -corrected /home/urbe/Tools/canu/Linux-amd64/bin/simTest20x/simTest20x.correctedReads.fasta -uncorrected /media/urbe/ARCgenomic/toyGenome/toyLongReads.fasta
- Mean that a large amount of nuc has been handled: 100000000
**-rm: cannot remove '/home/urbe/Tools/ELECTOR/swag*': No such file or directory**

Traceback (most recent call last):
  File "elector.py", line 168, in <module>
    main()
  File "elector.py", line 136, in main
    nbReads, throughput, precision, recall, correctBaseRate, errorRate, smallReads, wronglyCorReads, percentGCRef, percentGCCorr, numberSplit, meanMissing, numberExtended, meanExtension, minLength, indelsubsUncorr, indelsubsCorr , homoInsU, homoDeleU, homoInsC,  homoDeleC, homoInsUMean,  homoDeleUMean, homoInsCMean, homoDeleCMean = computeStats.outputRecallPrecision(sortedCorrectedFileName, outputDirPath, logFile, smallReads, wronglyCorReads, reportedHomopolThreshold, size_corrected_read_threshold, 0, 0, soft)
  File "/home/urbe/Tools/ELECTOR/computeStats.py", line 159, in outputRecallPrecision
    nbReads, throughput, precision, recall, corBasesRate, errorRate, extendedBases, missingSize,  GCRateRef, GCRateCorr, indelsubsUncorr, indelsubsCorr, numberHomopolymersInserInCorrected, numberHomopolymersDeleInCorrected , numberHomopolymersInserInUncorrected , numberHomopolymersDeleInUncorrected,    meanLengthDeleHomopolymersInUncorrected , meanLengthInserHomopolymersInUncorrected ,    meanLengthInserHomopolymersInCorrected ,    meanLengthDeleHomopolymersInCorrected  = computeMetrics(outDir + "/msa.fa", outMetrics, correctedFileName, reportedHomopolThreshold)
  File "/home/urbe/Tools/ELECTOR/computeStats.py", line 448, in computeMetrics
    upperCasePositions = getUpperCasePositions(correctedReadsList, lines)
  File "/home/urbe/Tools/ELECTOR/computeStats.py", line 624, in getUpperCasePositions
    upperCasePositions[-1] = [False] * len(correctedMsa)
**IndexError: list assignment index out of range**

I tried this as well, but ended with following error

➜ ELECTOR git:(master) ✗ python3 elector.py -perfect /media/urbe/ARCgenomic/toyGenome/toy.fasta -uncorrected /media/urbe/ARCgenomic/toyGenome/toyLongReads.fasta -corrected /home/urbe/Tools/canu/Linux-amd64/bin/simTest20x/simTest20x.correctedReads.fasta -threads 40 -split -corrector canu

  • Mean that a large amount of nuc has been handled: 100000000 -rm: cannot remove '/home/urbe/Tools/ELECTOR/swag*': No such file or directory

Traceback (most recent call last): File "elector.py", line 168, in main() File "elector.py", line 136, in main nbReads, throughput, precision, recall, correctBaseRate, errorRate, smallReads, wronglyCorReads, percentGCRef, percentGCCorr, numberSplit, meanMissing, numberExtended, meanExtension, minLength, indelsubsUncorr, indelsubsCorr , homoInsU, homoDeleU, homoInsC, homoDeleC, homoInsUMean, homoDeleUMean, homoInsCMean, homoDeleCMean = computeStats.outputRecallPrecision(sortedCorrectedFileName, outputDirPath, logFile, smallReads, wronglyCorReads, reportedHomopolThreshold, size_corrected_readthreshold, 0, 0, soft) File "/home/urbe/Tools/ELECTOR/computeStats.py", line 155, in outputRecallPrecision nbReads, throughput, precision, recall, corBasesRate, errorRate, extendedBases, missingSize, GCRateRef, GCRateCorr, indelsubsUncorr, indelsubsCorr, numberHomopolymersInserInCorrected, numberHomopolymersDeleInCorrected , numberHomopolymersInserInUncorrected , numberHomopolymersDeleInUncorrected, meanLengthDeleHomopolymersInUncorrected , meanLengthInserHomopolymersInUncorrected , meanLengthInserHomopolymersInCorrected , meanLengthDeleHomopolymersInCorrected = computeMetrics(outDir + "/msa" + soft + ".fa", outMetrics, correctedFileName, reportedHomopolThreshold ) File "/home/urbe/Tools/ELECTOR/computeStats.py", line 444, in computeMetrics msa = open(fileName, 'r') FileNotFoundError: [Errno 2] No such file or directory: '/home/urbe/Tools/ELECTOR/msa_canu.fa'

morispi commented 6 years ago

Hi,

Sorry for the late answer.

I can see you used "-perfect /media/urbe/ARCgenomic/toyGenome/toy.fasta". Is your "toy.fasta" file an actual genome? If so, this is most probably why you are getting this error.

When using the "-perfect" option, ELECTOR is assuming the input file represents "perfect / reference" reads, that are, reads without sequencing errors. If you wish to directly provide a genome to ELECTOR, and let it do the job of finding the perfect / reference reads itself, you should use the "-reference" option instead.

Please tell me if that helps!

Pierre

jnarayan81 commented 6 years ago

Hi @morispi Thanks for reply. I tried with -reference on example data python3 elector.py -reference example/example_reference.fasta -corrected example/corrected_reads.fasta -uncorrected Sim -threads 46 -simulator simlord -output test2

but it seems busy doing something for last 12 hour ... which I think is not normal ! Is that command right ? Or did I missed something ?

morispi commented 6 years ago

Hey,

If you wish to use the SimLoRD simulated reads from the toy example along with the -reference switch, you should run this command, as indicated in the README:

python3 elector.py -reference example/example_reference.fasta -corrected example/Simlord/correctedReads.fasta -uncorrected example/Simlord/simulatedReads -simulator simlord

Indeed, the reads from the -corrected and from the -uncorrected switches have to have matching headers. It seems like what you provided you the -uncorrected switch (Sim) is an unexisting file, hence the program never stopping.

Quick use guide of the parameters:

For simulated reads:

-corrected LR.fasta: here you must provide a file of corrected long reads in fasta format -uncorrected SimLRPrefix: here you must provide the prefix of the simulated reads files. They must be the original reads for which the correction has been provided to the -corrected switch. For example, if you simulated a set of reads with SimLoRD in the test/simulation/ directory, this directory will contain the following files: simReads.h5, simReads.fastq, and simReads.fastq.sam. You must thus provide test/simulation/simReads (the common prefix of all the files) to the -uncorrected witch. -reference ref.fasta: here you must provide a file containing the reference genome in fasta format -simulator name: here you must provide the name of the simulator that was used to simulate the read (we currently only support SimLoRD and NanoSim) -corrector name: here you must provide the name of the correction method that was used to correct the reads (see list of supported correctors in the README)

For real reads:

-corrected LR.fasta: here you must provide a file of corrected long reads in fasta format -uncorrected rawReads.fasta: here you must provide a file of uncorrected long reads in fasta format. They must be the original reads for which the correction has been provided to the -corrected switch. -reference ref.fasta: here you must provide a file containing the reference genome in fasta format -corrector name: here you must provide the name of the correction method that was used to correct the reads (see list of supported correctors in the README)

Please tell me if that helps, or if it's clear enough. If you don't manage to run ELECTOR on the datasets you want, just describe me what you want to do, and I'll help providing you the command line.

Cheers, P