Closed m-vieira closed 4 years ago
hmm yeah you're right that looks like a problem on our end. It's trying to find the mean j shm frequency for new allele clustering, but ends up taking the mean of an empty list. I'm not sure whether it should be possible or not to get a zero length list there though.
This looks like just stderr, could you also paste stdout?
As to the curses error, I think that is not actually causing a crash, but I'm also not sure what causes it. It sounds familiar but I don't think I've every run into it myself. What os are you on?
I'm on a linux cluster. Here's stdout, thanks!
non-human species 'mouse', turning on allele clustering parameter dir '_output/.._data_sequence_data_8-5' does not exist, so caching a new set of parameters before running action 'partition' caching parameters vsearch: 69154 / 69198 v annotations (44 failed) with 140 v genes in 5.5 sec keeping 59 / 261 v genes smith-waterman (new-allele clustering) vsearch: 69153 / 69198 v annotations (45 failed) with 59 v genes in 10.6 sec running 16 procs for 69198 seqs running 18 procs for 354 seqs info for 68910 / 69198 = 0.996 (288 failed) kept 2702 (0.039) unproductive removed 38944 / 68910 = 0.57 duplicate sequences after trimming framework insertions (leaving 29966) water time: 118.1 (ig-sw 22.5 processing 1.1) clustering for new alleles removing 29965/29966 sequences with v_5p or j_3p deletions collapsed 1 input sequences into 0 representatives from 1 clones (removed 1 clones with >= 8 j mutations) mutation among all cluster representatives: v / j = nan / nan = nan
removing 29965/29966 sequences with v_5p or j_3p deletions
collapsed 1 input sequences into 0 representatives from 1 clones (removed 1 clones with >= 8 j mutations)
mutation among all cluster representatives: v / j = nan / nan = nan
ok so in the clustering-style allele inference it only looks at full-length sequences, i.e. that have all of V and all of J (since most reads these days are full length anyway, and it would make things way more complicated). On this sample this requirement is removing all but one of the 30k input sequences from consideration for allele inference, and then that one is being removed by another requirement. If you expect that your sequences are not full length, this just means the inferred germline will be less accurate, but is all expected (except the bit where it crashes, which I've fixed here, note that it usually takes an hour or so for docker hub to update).
Great. Thank you so much!
I'm running partis partition on several mouse heavy chain BCR datasets. For most datasets it runs successfully (which suggests this is not a problem with my local installation on a cluster), but for some I get the error message below. Any ideas would be appreciated.
Cheers,
Marcos