stschiff / sequenceTools

Other
39 stars 10 forks source link

parsing error #22

Closed npsonis closed 2 years ago

npsonis commented 2 years ago

Hi, I have been using pileupCaller for creating pseudohapoid data using the 1240K SNP positions with no problems. I decided to run it again in order to do the same, but using a custom panel of SNPs positions and I run into the following parsing error:

pileupCaller: SeqFormatException "Error while parsing: Failed reading: satisfyWith. Error occurred when trying to parse this chunk: \" . 1 0.041208 4120796 X X\n . 1 0.041209 4120878 G A\n . 1 0.041209 4120888 G T\n . 1 0.041211 4121071 C T\n . 1 0.041211 4121105 G X\n . 1 0.041213 4121349 A G\n . 1 0.041216 4121584 G A\n . 1 0.041216 4121607 C X\n . 1 0.041217 4121653 A X\n . 1 0.041217 4121696 A X\n . 1 0.041218 4121779 C T\n . 1 0.041218 4121812 C T\n . 1 0.041218 4121839 C A\n . 1 0.041223 4122344 A C\n . 1 0.041224 4122362 C T\n . 1 0.041226 4122571 T X\n . 1 0.041232 4123204 T X\n . 1 0.041235 4123486 C T\n . 1 0.041237 4123651 A G\n . 1 0.041238 4123849 C G\n . 1 0.041245 4124548 T X\n . 1 0.041252 4125174 A X\n . 1 0.041258 4125772 C T\n . 1 0.041259 4125950 T X\n . 1 0.041260 4126007 T X\n . 1 0.041264 4126404 A C\n . 1 0.041265 4126481 A X\n . 1 0.041265 4126523 G X\n . 1 0.041272 4127240 G X\n . 1 0.041276 4127635 A G\n . 1 0.041285 4128523 A C\n . 1 0.041309 4130870 C A\n . 1 0.041312 4131181 T X\n . 1 0.041320 4132018 C T\n . 1 0.041328 4132777 T C\n . 1 0.041331 4133075 G T\n . 1 0.041331 4133096 C X\n . 1 0.041331 4133143 G A\n . 1 0.041332 4133190 G X\n . 1 0.041332 4133241 G X\n . 1 0.041347 4134715 A X\n . 1 0.041360 4135956 A G\n . 1 0.041399 4139931 T X\n . 1 0.041405 4140513 T X\n . 1 0.041408 4140814 A G\n . 1 0.041409 4140876 A X\n . 1 0.041412 4141194 T G\n . 1 0.041414 4141352 G X\n . 1 0.041459 4145875 C X\n . 1 0.041460 4146028 C X\n . 1 0.041467 4146671 C T\n . 1 0.041469 4146876 C T\n . 1 0.041471 4147088 T G\n . 1 0.041480 4147961 C X\n . 1 0.041480 4148023 T X\n . 1 0.041493 4149304 G X\n . 1 0.041499 4149901 C T\n . 1 0.041505 4150527 A X\n . 1 0.041518 4151751 C X\n . 1 0.041522 4152206 T X\n . 1 0.041527 4152715 A G\n . 1 0.041541 4154138 C X\n . 1 0.041550 4155013 A G\n . 1 0.041562 4156178 A X\n . 1 0.041572 4157198 G X\n . 1 0.041578 4157773 C G\n . 1 0.041590 4158955 G T\n . 1 0.041597 4159657 A G\n . 1 0.041609 4160924 C A\n . 1 0.041620 4161981 T C\n . 1 0.041668 4166814 C T\n . 1 0.041673 4167285 C T\n . 1 0.041676 4167635 C T\n . 1 0.041683 4168275 C T\n . 1 0.041687 4168703 T C\n . 1 0.041698 4169786 C X\n . 1 0.041704 4170412 T X\n . 1 0.041716 4171595 A G\n . 1 0.041725 4172533 T X\n . 1 0.041741 4174116 C X\n . 1 0.041743 4174257 C G\n . 1 0.041743 4174259 T X\n . 1 0.041750 4174989 A T\n . 1 0.041753 4175337 A X\n . 1 0.041758 4175784 A G\n . 1 0.041761 4176106 T C\n . 1 0.041778 4177770 A X\n . 1 0.041785 4178502 T X\n . 1 0.041790 4178980 T C\n . 1 0.041802 4180210 A X\n . 1 0.041806 4180638 C X\n . 1 0.041808 4180842 T X\n . 1 0.041809 4180936 A C\n . 1 0.041834 4183400 A C\n . 1 0.041835 4183472 T X\n . 1 0.041850 4185002 C T\n . 1 0.041851 4185126 G X\n . 1 0.041858 4185803 A X\n . 1 0.041863 4186309 C T\n . 1 0.041864 4186415 G T\n . 1 0.041864 4186420 G T\n . 1 0.041865 4186473 G X\n . 1 0.041880 4187952 C X\n . 1 0.041888 4188759 C T\n . 1 0.041896 4189558 A G\n . 1 0.041896 4189629 C X\n . 1 0.041907 4190685 G X\n . 1 0.041925 4192471 A X\n . 1 0.041927 4192666 G A\n . 1 0.041934 4193410 G X\n . 1 0.041934 4193426 A X\n . 1 0.041938 4193806 C T\n . 1 0.041939 4193873 A G\n . 1 0.041964 4196428 G A\n . 1 0.041968 4196795 A G\n . 1 0.041968 4196809 T X\n . 1 0.041969 4196858 C X\n . 1 0.041975 4197508 G X\n . 1 0.041975 4197515 C T\n . 1 0.041980 4197987 A G\n . 1 0.041980 4198042 C X\n . 1 0.041987 4198654 A T\n . 1 0.041987 4198684 A C\n . 1 0.041987 4198701 C T\n . 1 0.041987 4198705 A X\n . 1 0.041988 4198776 T X\n . 1 0.041996 4199646 C T\n . 1 0.042005 4200491 A G\n . 1 0.042028 4202818 A G\n . 1 0.042030 4202993 T C\n . 1 0.042046 4204588 T X\n . 1 0.042046 4204593 C T\n . 1 0.042046 4204633 C T\n . 1 0.042046 4204646 C X\n . 1 0.042047 4204653 A X\n . 1 0.042047 4204741 C T\n . 1 0.042048 4204786 A T\n . 1 0.042052 4205176 A T\n . 1 0.042063 4206288 T X\n . 1 0.042070 4206962 C X\n . 1 0.042070 4206964 C X\n . 1 0.042073 4207343 C T\n . 1 0.042074 4207398 A X\n . 1 0.042085 4208542 T X\n . 1 0.04\""

The line of the pileup file contating the first SNP position mentioned above (but not the first of the bed file that I provided) is this: 1 4120796 A 0 0 0 1 c A 0 2 Ct EE

Any ideas to overcome this issue? Thanks N.

stschiff commented 2 years ago

Hmm, it seems to trip over your Eigenstrat file. I just checked... you have dots as SNP IDs in the first column... and in theory my code should parse that alright... could you perhaps send me a minimal version of both the Eigenstrat file and the bam file to reproduce this problem?

npsonis commented 2 years ago

Here there are the first 10 lines of the .snp file: . 1 0.000547 54716 T X . 1 0.000553 55299 T X . 1 0.000738 73841 T X . 1 0.000860 86028 C X . 1 0.000916 91561 A X . 1 0.001030 102951 C T . 1 0.001249 124897 G X . 1 0.005321 532080 C X . 1 0.005343 534315 A X . 1 0.005347 534698 A X It make sense (at least to me) the IDs to be dots as I have not provided any info there (like the rs codes that the 1240K panel has). I suspect that something goes on with the format of the snp file (with the delimiters maybe?), but I cannot figure it out... I am sending the file itself too. I generated the snp file using convertf and ped/map files and the "familynames: NO" option.

downsampled.snp.txt

EDIT

I just run it again with a smaller snp file (without the SNP positions that create the issue above and works ok). So the format is not a problem, after all.

stschiff commented 2 years ago

I'll take a look. A bit overworked right now, but won't get forgotten!

stschiff commented 2 years ago

OK, I could not reproduce this error with my newest sequence-formats library. The error above indicates there is a problem in parsing the SNP file, but neither your down-sampled SNP file nor the chunk which is output above poses a problem as far as I can see. Perhaps you can check which version you're using, first, by running pileupCaller --version?

stschiff commented 2 years ago

Otherwise you'll need to email me and send me actual file samples and a command line that reproduce the error for me to investigate. It's not something obvious, I'm afraid.

npsonis commented 2 years ago

Hi, I am running 1.4.0.5. When I am trying to install 1.4.1 I get the following:


Downloading lts-17.0 build plan ... RedownloadInvalidResponse Request { host = "raw.githubusercontent.com" port = 443 secure = True requestHeaders = [] path = "/fpco/lts-haskell/master//lts-17.0.yaml" queryString = "" method = "GET" proxy = Nothing rawBody = False redirectCount = 10 responseTimeout = ResponseTimeoutDefault requestVersion = HTTP/1.1 } "/home/kluser1/.stack/build-plan/lts-17.0.yaml" (Response {responseStatus = Status {statusCode = 404, statusMessage = "Not Found"}, responseVersion = HTTP/1.1, responseHeaders = [("Connection","keep-alive"),("Content-Length","14"),("Content-Security-Policy","default-src 'none'; style-src 'unsafe-inline'; sandbox"),("Strict-Transport-Security","max-age=31536000"),("X-Content-Type-Options","nosniff"),("X-Frame-Options","deny"),("X-XSS-Protection","1; mode=block"),("Content-Type","text/plain; charset=utf-8"),("X-GitHub-Request-Id","8118:C9B3:120374E:12E403F:61ADF8CE"),("Accept-Ranges","bytes"),("Date","Mon, 06 Dec 2021 11:49:34 GMT"),("Via","1.1 varnish"),("X-Served-By","cache-mxp6936-MXP"),("X-Cache","MISS"),("X-Cache-Hits","0"),("X-Timer","S1638791374.175529,VS0,VE188"),("Vary","Authorization,Accept-Encoding,Origin"),("Access-Control-Allow-Origin","*"),("X-Fastly-Request-ID","5135447d12a176b64a1144bd4fa6f45b518cf9e4"),("Expires","Mon, 06 Dec 2021 11:54:34 GMT"),("Source-Age","0")], responseBody = (), responseCookieJar = CJ {expose = []}, responseClose' = ResponseClose})


I tried both via stack and hackage and from source via stack.

I will email you the requested files, though.

Best

stschiff commented 2 years ago

That's a slack error... weird. Have you upgraded slack? You could try slack upgrade or slack update (not sure which one).

npsonis commented 2 years ago

Hi again,

I updated stack and managed to update sequencetools to 1.5.1, too. Then the issue was resolved. Thanks for the help! You may close it.

stschiff commented 2 years ago

Thanks!