Closed stsmall closed 6 years ago
@dschride answered my question in a separate email. I am including the answer here:
cat "simulationInputFile" | msMaskAllRows "maskFileName" | python removeNedOutColumnsFromMsFile.py stdin > "outputFile"
Where simulationInputFile is the path your .ms-style formatted simulation input, and maskFileName is the path to your mask file that you are inquiring about. The masked version will be written to outputFile. I will illustrate the format to the maskFile by example:
"0 0.1 0.2 0 0.3 0.4 // 0 0.05 0.06 0 0.5 0.9 0 0.95 0.99 //"
The above file can be used to mask a simulation file with two replicates. In the first rep, all positions between 0.1 and 0.2 and between 0.3 and 0.4 will be masked (the 0 at the beginning of each line can be ignored). In the second rep (after the first //) we will mask three different stretches of sequence. Note that the lines with numbers are space-separated.
Hi @andrewkern, @dschride, I have already masked repeats and low-confidence calls in my Fasta files and would like to also include a mask file with the training data. I am a little unclear as to the format of this file, would you be able to provide a brief example please. thanks, @stsmall