stschiff / sequenceTools

Other
39 stars 10 forks source link

Add plink support #19

Closed teepean closed 3 years ago

teepean commented 3 years ago

Add support for outputting to plink format.

teepean commented 3 years ago

I have no idea how command line parsing works in Haskell so this one does not work properly. The following line is apparently not correct as the plink option does not work.

parseFormat = (EigenstratFormat <$> parseEigenstratPrefix <*> parseSamplePopName) <|> (PlinkFormat <$> parsePlinkPrefix <*> parseSamplePopName) <|> pure FreqSumFormat

stschiff commented 3 years ago

Thanks for the initiative, I'll take a look!

stschiff commented 3 years ago

@teepean I've pushed some changes. Could you please check whether this works as expected, if you have some files to test it out? Note that I've changed the stack snapshot (now lets-17.0), so if you git pull from this branch and then run stack install on your computer, it may take a while, as some dependencies have been updated and your GHC version will be updated too.

stschiff commented 3 years ago

And I just now saw that you found a problem with the parseFormat line... weird, I think that looks correct. Let me know whether it works now. I have time to test it out myself later or tomorrow.

teepean commented 3 years ago

Doesn't seem to be working.

Invalid option--plinkOut'`

or

Invalid option `-p'

Did you mean one of these?
    -h
    -d
    -e

But if I modify this line

parseFormat = (EigenstratFormat <$> parseEigenstratPrefix <*> parseSamplePopName) <|> (PlinkFormat <$> parsePlinkPrefix <*> parseSamplePopName) <|> pure FreqSumFormat

to this plinkOut starts working but then eigenstratOut stops working.

parseFormat = (PlinkFormat <$> parsePlinkPrefix <*> parseSamplePopName) <|> (EigenstratFormat <$> parseEigenstratPrefix <*> parseSamplePopName) <|> pure FreqSumFormat

stschiff commented 3 years ago

I cannot reproduce this. I just changed the code so that pileupCaller only outputs the options it gets. Can you please confirm this behaviour, which is as expected?

$ pileupCaller --randomHaploid -f tt -e fp --sampleNames t1,t2
(RandomCalling,False,1,AllSites,EigenstratFormat "fp" "Unknown","tt",["t1","t2"])
$ pileupCaller --randomHaploid -f tt -p fp --sampleNames t1,t2
(RandomCalling,False,1,AllSites,PlinkFormat "fp" "Unknown","tt",["t1","t2"])
$ pileupCaller --randomHaploid -f tt --sampleNames t1,t2
(RandomCalling,False,1,AllSites,FreqSumFormat,"tt",["t1","t2"])

so the output Format gets correctly parsed depending on which flag you use (-p for Plink, -e for Eigenstrat or none for freqsum format)

stschiff commented 3 years ago

I'll merge this now, test some more and push a new release when it works. Thanks for the input @teepean.

teepean commented 3 years ago

Sorry for late reply. It does work as expected.

pileupCaller --randomHaploid -f tt -e fp --sampleNames t1,t2
(RandomCalling,False,1,AllSites,EigenstratFormat "fp" "Unknown","tt",["t1","t2"])
pileupCaller --randomHaploid -f tt -p fp --sampleNames t1,t2
(RandomCalling,False,1,AllSites,PlinkFormat "fp" "Unknown","tt",["t1","t2"])
pileupCaller --randomHaploid -f tt --sampleNames t1,t2
(RandomCalling,False,1,AllSites,FreqSumFormat,"tt",["t1","t2"])