ncullen93 / pyBN

Bayesian Networks in Python
MIT License
146 stars 55 forks source link

BDe score #20

Closed sonujose123 closed 7 years ago

sonujose123 commented 7 years ago

I created BN from a BIF file (asia.bif) . But when I try to find the score using BDe with 'lizards.csv' it is failing.

Code -
from pyBN import * import numpy as np import os from os.path import dirname

file = 'data/asia.bif' bn = read_bn(file) dpath = os.path.join(dirname(dirname(dirname(dirname(file)))),'data')
path = (os.path.join(dpath,'lizards.csv')) data = np.loadtxt(path, dtype='int32',skiprows=1,delimiter=',')

print BDe(bn,data)

Error

Traceback (most recent call last): File "test.py", line 12, in print BDe(bn,data) File "/home/sonu/Documents/pyBN/pyBN-master/pyBN/learning/structure/score/bayes_scores.py", line 61, in BDe counts_dict = mle_fast(bn, data, counts=True, np=True) File "/home/sonu/Documents/pyBN/pyBN-master/pyBN/learning/parameter/mle.py", line 41, in mle_fast F[n]['values'] = list(nmp.unique(data[:,i])) IndexError: index 3 is out of bounds for axis 1 with size 3

ncullen93 commented 7 years ago

I think this is because the network and the data don't match.. The andes network has these nodes: ['asia', 'smoke', 'tub', 'bronc', 'lung', 'either', 'dysp', 'xray'], whereas the lizards dataset only has 3 columns (nodes).. The BDe score measures how well a given BN fits a COMPATIBLE dataset (i.e. the nodes of the BN match up with the columns of the dataset). :)

ncullen93 commented 7 years ago

Note, you can try LEARNING the structure from the lizards.csv dataset and check the BDe score.. OR you can generate your own random andes dataset and check the BDe score.

sonujose123 commented 7 years ago

Thanks a lot for your quick response. On Dec 4, 2016 8:43 PM, "Nicholas Cullen" notifications@github.com wrote:

Note, you can try LEARNING the structure from the lizards.csv dataset and check the BDe score.. OR you can generate your own random andes dataset and check the BDe score.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ncullen93/pyBN/issues/20#issuecomment-264767026, or mute the thread https://github.com/notifications/unsubscribe-auth/AQjW7cACYYtAtT-aHoMcVKIBMBtBEL6yks5rE5aPgaJpZM4LD0wr .

sonujose123 commented 7 years ago

Hi Nicholas, I tried with modified csv with asia.bif . Then also I am facing the same issue. I got this error - python test.py ['asia', 'smoke', 'tub', 'bronc', 'lung', 'either', 'dysp', 'xray'] ('asia', 'tub') ('smoke', 'bronc') ('smoke', 'lung') ('tub', 'either') ('bronc', 'dysp') ('lung', 'either') ('either', 'dysp') ('either', 'xray') /home/sonu/Documents/pyBN/pyBN-master/pyBN/learning/parameter/mle.py:48: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future F[rv]['cpt'] = nmp.histogram(data[:,rv], bins=bn.card(rv))[0] Traceback (most recent call last): File "test.py", line 20, in print BDe(bn,data) File "/home/sonu/Documents/pyBN/pyBN-master/pyBN/learning/structure/score/bayes_scores.py", line 61, in BDe counts_dict = mle_fast(bn, data, counts=True, np=True) File "/home/sonu/Documents/pyBN/pyBN-master/pyBN/learning/parameter/mle.py", line 48, in mle_fast F[rv]['cpt'] = nmp.histogram(data[:,rv], bins=bn.card(rv))[0] IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices

Please help me.

Regards asia.bif.zip

Sonu lizards.csv.zip

ncullen93 commented 7 years ago

Ok i think i fixed it... pandas must have changed their indexing since I wrote this. It should work if 'data' is a pandas dataframe whose columns are same as BN nodes.. but i think it will now be broken if data is numpy array.

ncullen93 commented 7 years ago

Pull the repository and try again