konsumer / powerball

My attempt at predicting Powerball numbers with nodejs
10 stars 4 forks source link

Powerball draws dataset #6

Open slax0r opened 8 years ago

slax0r commented 8 years ago

Just came across your wonderful script this morning. Its pretty sweet. I do want to point out a couple of things though regarding the Powerball draws dataset as I noticed that you are only using the winning draws for analysis and generation.

Powerball runs several test draws before the one you actually see on TV and one or two after. So by using only the winning draws you are limiting your app in its accuracy. All draw data can be attained here: http://www.powerball.com/powerball/testpb.doc

Its a ms word doc which is kind of a PITA but is handled pretty easily with the catdoc utility and a little awk and sort magic.

Also to point out that Powerball also uses 6 different ball sets and a few different machines. All of that info is in the word doc also should you consider doing something with mechanical probabilities.

If you should decide to try your hand at Megamillions, all the same stuff above applies but the url becomes: http://www.powerball.com/megamillions/testmm.doc

Again awesome tool.

konsumer commented 8 years ago

This is awesome info! I love the idea of mechanical probabilities using dry runs, but am unsure how to implement that.

I will make a lil function to extract the data, if you'd like to submit a PR for the mechprob stuff, that'd be awesome!

slax0r commented 8 years ago

I implemented(read: hacked together) the mechprob stuff quite a while ago in Java using Apache commons-collections Bag (actually a bag of bags) https://commons.apache.org/proper/commons-collections/javadocs/api-2.1.1/org/apache/commons/collections/Bag.html. Totally no idea what a javascript version of that would look like. Ill see if I can dig the java code out though, its been a long while.

Not sure my JS-fu is quite there yet but Ill give it a shot.

Will probably need that extractor first though.

konsumer commented 8 years ago

I think this might do the same as frequencies(). I'm trying to workout a doc-reading solution, but at the moment I'm on a windows machine, and it's suprisingly hard to find a way to read them.

slax0r commented 8 years ago

It does do the same as frequencies but after it segregates based on the ball set.

konsumer commented 8 years ago

frequency currently segregates them based on white/red, too.

I made an initial stab at a doc parser (since I couldn't find anything that worked cross-platform.) It's extremely specific to the powerball doc file, but should work in the future:

function numbers(){
  return fetch('http://www.powerball.com/powerball/testpb.doc')
    .then(res => {
      return res.text()
    })
    .then(blob => {
      return ('08/31/05' + blob.split('08/31/05').pop().split('')[0])
        .split('\r')
        .filter(e => { return e !== '' })
        .map(row => { return row.split('\t') })
    })
    .then(rows => {
      return rows.map(line => {
        return {
            date: new Date(line[0]).getTime(),
            white: line.slice(1, 6).map(v => {
              return parseInt(v, 10)
            }),
            red: parseInt(line[8], 10),
            powerplay: line[9] === '--' ? 1 : parseInt(line[9], 10),
            drawType: line[12],
            extra: [6, 7, 10, 11].map(e => { return parseInt(line[e], 10) })
          }
      })
    })
}

There is extra info that I'm not sure what to do with (stored in extra) but this should get us started. It outputs the same format as my old numbers() function, but doesn't go back as far (because the doc doesn't have the old data) I double-checked the last few "Draw" items, and they check out!

konsumer commented 8 years ago

Example output:

[
{ date: 1455350400000,
    white: [ 3, 27, 52, 58, 68 ],
    red: 8,
    powerplay: 1,
    drawType: 'Pre-test',
    extra1: [ 9, 41, 11, 24 ] },
  { date: 1455350400000,
    white: [ 7, 15, 36, 18, 19 ],
    red: 20,
    powerplay: 2,
    drawType: 'Draw',
    extra1: [ 9, 41, 11, 24 ] },
  { date: 1455350400000,
    white: [ 40, 57, 36, 12, 35 ],
    red: 9,
    powerplay: 1,
    drawType: 'Post-test',
    extra1: [ 9, 41, 11, 24 ] }
]
konsumer commented 8 years ago

Just added it here but github isn't showing the ETX char in the split.

slax0r commented 8 years ago

That is very cool. The command line I used earlier today on my linux box:

$ wget -qO- http://www.powerball.com/powerball/testpb.doc | catdoc | grep ^[0-9] | sed -e "s/\/3115/\/31\/15/g" -e "s/--/0/g" -e 's/\/([0-9][0-9])\t/\/20\1\t/g' | awk '{print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t"$9"\t"$10}'|sort -t$"/" -k3hr >> pbresults.txt

which I served to the powerbal lib via hacked little static express script.

I think the only challenge I had to work out is that is has to be a dos format file which I remedied with unix2dos.

too bad catdoc isnt a node module. ... opportunity ... I think so.

konsumer commented 8 years ago

There is textract but I couldn't get it working on windows. My little parsing function seems to work ok, so I added it to the library as numbersAll. Maybe eventually, this will be the main numbers function.

konsumer commented 8 years ago

In case it wasn't clear, here is how to use it to get predictions.

slax0r commented 8 years ago

yeah, sadly textract unfortunately attempts to exec catdoc rather than integrate it at the code level.