Closed ssnn-airr closed 7 years ago
Original comment by Jason Vander Heiden (Bitbucket: javh, GitHub: javh).
This is indeed a MIGEC header - specifically the consensus sequence output.
Mike is making a change in MIGEC v1.2.7 that should resolve the Bio.SeqIO.index
incompatibility.
I've add a migec
mode to ConvertHeaders in bc50b30 that should work with the new MIGEC version.
Original comment by Scott Christley (Bitbucket: [Scott Christley](https://bitbucket.org/Scott Christley), ).
I didn't run with only those sequences, I just cut/paste a few sequences from the top of each file. I kinda assumed it was an issue with the read_id (got read id's on the brain today!), but I will try a run with just those few sequences. I've no idea what the format is, I could try to contact the user to ask, but we cannot rely on getting a response. FYI- this isn't a user reported error, I watch jobs and pro-actively check on errors. Users have a tendency to ignore errors and/or not bother to report them...
Original comment by Jason Vander Heiden (Bitbucket: javh, GitHub: javh).
Weird. I'll look at it. It fails with just those top 4 sequences, yes? I suspect it's because the header format isn't supported. Is that MiGEC format?
It would need to use the UMI:TCGGCCAACAAA
bit to pair the reads and doesn't know how to extract it. I guess we could also add an optional flag to ignore the headers and just blindly trust that the reads are paired in file order. Makes me nervous, but it's an option.
Oh, BTW... just noticed. --nproc 48
probably won't be appreciably faster than --nproc 20
. Scaling is not the best with Python's multiprocessing library...
Original comment by Jason Vander Heiden (Bitbucket: javh, GitHub: javh).
The warning is from an incompatibility between pRESTO v0.5.2 and newer versions of NumPy/SciPy. It should be fixed in v0.5.3. Can you update pRESTO?
The unrecognized type is probably because it thinks .t4.fastq
is the file extension instead of .fastq
. That shouldn't be happening, because it would mean that os.path.splitext()
isn't well behaved. I'll take a look tomorrow.
Original comment by Scott Christley (Bitbucket: [Scott Christley](https://bitbucket.org/Scott Christley), ).
Yep, you can test with these files.
Original comment by Scott Christley (Bitbucket: [Scott Christley](https://bitbucket.org/Scott Christley), ).
Yeah, I'll see about upgrading pRESTO for the next VDJServer release.
I tried changing the file names, but same error:
AssemblePairs.py align -1 Bb_R1_t4.fastq -2 Bb_R2_t4.fastq --coord illumina --rc tail --outname Bb_R1_t4
START> AssemblePairs
COMMAND> align
FILE1> Bb_R1_t4.fastq
FILE2> Bb_R2_t4.fastq
COORD_TYPE> illumina
ALPHA> 1e-05
MAX_ERROR> 0.3
MIN_LEN> 8
MAX_LEN> 1000
SCAN_REVERSE> False
NPROC> 48
ERROR: File Bb_R1_t4.fastq has an unrecognized type
Original report by Scott Christley (Bitbucket: [Scott Christley](https://bitbucket.org/Scott Christley), ).
I have a user that uploaded a set of paired-end read files and is trying to run pRESTO on them, but it fails during the assemble stage.
technically I probably shouldn't be specifying
--coord illumina
, but I tried the other possibilities and none of them worked. I also tried running ConvertHeaders with all the different conversion methods but they all produce unrecognized type error. Here are the first few sequences in each fileR1:
R2: