wentaozhu / deep-mil-for-whole-mammogram-classification

Zhu, Wentao, Qi Lou, Yeeleng Scott Vang, and Xiaohui Xie. "Deep Multi-instance Networks with Sparse Label Assignment for Whole Mammogram Classification." MICCAI 2017.
MIT License
114 stars 37 forks source link

inbreast.py error #2

Closed djones4487169 closed 7 years ago

djones4487169 commented 7 years ago

Hi Wentao - I'm making progress but now get the following error in inbreast.py at line 158:

for train, test in skf.split(x,y):

Error:

File "C:\Python35\lib\site-packages\sklearn\utils\validation.py", line 126, in num_samples " a valid collection." % x) peError: Singleton array array(dict_values([0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, , 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, , 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, , 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, , 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, , 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1]), dt e=object) cannot be considered a valid collection.

wentaozhu commented 7 years ago

Hi, I think you should check the x, y shape in for train, test in skf.split(x,y):. Make sure it is consistent with your skf.split function. You know, these packages are always updating. Some interfaces may be changing. I think it will be easy to fix. Good luck. If you cannot fix it, tell me specific problems and more informations, I will spend some time on it. Thanks!

djones4487169 commented 7 years ago

Thanks Wentao - I will let you know how I get on and much appreciated if you can help get it working in Keras 2.0.

David

On 25 May 2017 at 20:51, Wentao Zhu notifications@github.com wrote:

Hi, I think you should check the x, y shape in for train, test in skf.split(x,y):. Make sure it is consistent with your skf.split function. You know, these packages are always updating. Some interfaces may be changing. I think it will be easy to fix. Good luck. If you cannot fix it, tell me specific problems and more informations, I will spend some time on it. Thanks!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/wentaozhu/deep-mil-for-whole-mammogram-classification/issues/2#issuecomment-304106311, or mute the thread https://github.com/notifications/unsubscribe-auth/ATg8ZFZERYqczHmPtZ7P3hY-pjbyEUjgks5r9dvEgaJpZM4NmXAq .

djones4487169 commented 7 years ago

Hi Wentao, I have downgraded to keras 1.2 hoping it would help but I still get the same error message in for train, test in skf.split(x,y): and can't get pass this point.

David

djones4487169 commented 7 years ago

BTW I have upgraded back to Keras 2.0 since I need it for other projects but I guess the error is based on something else though! It would be great if you could get an updated version working please?

wentaozhu commented 7 years ago

check the skf version and skf.split function. You can just check the usage of the function as http://scikit-learn.org/stable/modules/cross_validation.html. Hope it helps you.

djones4487169 commented 7 years ago

Hi Wentao,

I've spent some time looking at the error:

for train, test in skf.split(x,y): in run_cnn_k_new.py

In the usage description both x and y structure are the same as those input here. The only thing I can think of is that mydict created from readLabel() is formatting in a way skf.split() does not like when fed into cvsplitenhance(fold, totalfold, mydict, valfold=valfold)

I can't see the difference between in data structure in their example and the one here?

David

On 30 May 2017 at 20:55, Wentao Zhu notifications@github.com wrote:

check the skf version and skf.split function. You can just check the usage of the function as http://scikit-learn.org/stable/modules/cross_ validation.html. Hope it helps you.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/wentaozhu/deep-mil-for-whole-mammogram-classification/issues/2#issuecomment-304989689, or mute the thread https://github.com/notifications/unsubscribe-auth/ATg8ZHUftqbX4jEZGC2lEBMHYw-WvBxFks5r_HQcgaJpZM4NmXAq .

wentaozhu commented 7 years ago

You can use type(x), type(y) to get the type of x, y easily. If it is consist with that of split(), you can send me email and I can help you debug if it is okay for you.

djones4487169 commented 7 years ago

I get for type x, y:

<class 'dict_keys'> <class 'dict_values'>

Does that help?

wentaozhu commented 7 years ago

From the following document, x, y should be of numpy array like type.

http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedKFold.html#sklearn.model_selection.StratifiedKFold

split(X, y, groups=None)[source]

Generate indices to split data into training and test set.
Parameters: 

X : array-like, shape (n_samples, n_features)

    Training data, where n_samples is the number of samples and n_features is the number of features.

    Note that providing y is sufficient to generate the splits and hence np.zeros(n_samples) may be used as a placeholder for X instead of actual training data.

y : array-like, shape (n_samples,)

    The target variable for supervised learning problems. Stratification is done based on the y labels.

groups : object

    Always ignored, exists for compatibility.

Returns:    

train : ndarray

    The training set indices for that split.

test : ndarray

    The testing set indices for that split.
djones4487169 commented 7 years ago

OK so far I have had to make the following changes to inbreast.py to update to Python 3: In: def cvsplitenhance(fold, totalfold, mydict, valfold=-1):

y2 = np.fromiter(iter(mydict.values()), dtype=int) x2 = np.fromiter(iter(mydict.keys()), dtype=int) for train, test in skf.split(x2,y2):

In: def loaddataenhance(fold, totalfold, valfold=-1, valnum=60):

mydictkey = list(mydict.keys()) mydictvalue = list(mydict.values())

Also change xrange to range where applicable.

Now there is a problem with encoding/decoding the pickle files at: im = cPickle.load(inputfile) Error: UnicodeDecoderError: 'ascii' codec can't decode byte 0xac in position 0: ordinal not in range(128)

I'm using 64-bit Windows 8.0 and when I use print(sys.getdefaultencoding()) I get utf-8

Any ideas how to solve this one?

djones4487169 commented 7 years ago

It appears that forcing encoding='latin1' fixes the codec problem. That is:

im = cPickle.load(inputfile, encoding='latin1')

Also, all occurrences of W_regularizerl1l2() should be changed to W_regularizerl1_l2() in the model defs

djones4487169 commented 7 years ago

Hi Wentao,

Error when running: run_cnn_k_new.py:

Negative dimension size caused by subtracting 11 from 3 for 'conv_1/convolution' with input shape: [?,3,227,227], [11,11,227,96]

I guess this is a dim_ordering issue between tensorflow and theano (i'm using tf backend) do you know a way to change this I've tried a few things but still crashes?

David

djones4487169 commented 7 years ago

Hi Wentao,

Solved quite a lot of the problems and been updating on Github for anyone else using the software to make the appropriate changes where necessary. You can check the comments and let me know if you think the changes should have correctly solved the errors in inbreast.py using Keras 2 and Python 3.5.

I now get an error when running: run_cnn_k_new.py:

Negative dimension size caused by subtracting 11 from 3 for 'conv_1/convolution' with input shape: [?,3,227,227], [11,11,227,96]

I guess this is a dim_ordering issue between tensorflow and theano (i'm using tf backend) do you know a way to change this I've tried a few things but still crashes?

David

On 2 June 2017 at 18:05, Wentao Zhu notifications@github.com wrote:

From the following document, x, y should be of numpy array like type.

http://scikit-learn.org/stable/modules/generated/sklearn.model_selection. StratifiedKFold.html#sklearn.model_selection.StratifiedKFold

split(X, y, groups=None)[source]

Generate indices to split data into training and test set. Parameters:

X : array-like, shape (n_samples, n_features)

Training data, where n_samples is the number of samples and n_features is the number of features.

Note that providing y is sufficient to generate the splits and hence np.zeros(n_samples) may be used as a placeholder for X instead of actual training data.

y : array-like, shape (n_samples,)

The target variable for supervised learning problems. Stratification is done based on the y labels.

groups : object

Always ignored, exists for compatibility.

Returns:

train : ndarray

The training set indices for that split.

test : ndarray

The testing set indices for that split.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/wentaozhu/deep-mil-for-whole-mammogram-classification/issues/2#issuecomment-305853117, or mute the thread https://github.com/notifications/unsubscribe-auth/ATg8ZNDH_YAwJ9y7dE_Xm-CJu-3idZ78ks5sAEDogaJpZM4NmXAq .

wentaozhu commented 7 years ago

Yes. Check your keras configure file. If the channel is the last dimension, change it to the first dimension. I think it will be ok because you can see [?,3,227,227], the channel is the first dimension. Or you can do minor change to the input data. Thanks!

djones4487169 commented 7 years ago

I changed the "image_data_format line" in the keras.json config file:

"backend": "tensorflow", "epsilon": 1e-07, "image_data_format": "channels_first", "floatx": "float32"

But now the program just hangs at:

conv_1 = Convolution2D(96, 11, 11,subsample=(4,4),activation='relu', W_regularizer=l1_l2(l1=l1factor, l2=l2factor), name='conv_1')(inputs)

in def AlexNet() without any clue to error just stalls wont go any further? Any ideas?

On 3 June 2017 at 18:43, Wentao Zhu notifications@github.com wrote:

Yes. Check your keras configure file. If the channel is the last dimension, change it to the first dimension. I think it will be ok because you can see [?,3,227,227], the channel is the first dimension. Or you can do minor change to the input data. Thanks!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/wentaozhu/deep-mil-for-whole-mammogram-classification/issues/2#issuecomment-305990371, or mute the thread https://github.com/notifications/unsubscribe-auth/ATg8ZDSSI7uU-1nhh77nVZ6Btta3tu5wks5sAZtNgaJpZM4NmXAq .

wentaozhu commented 7 years ago

Restart, if the problem exists, it probably be the keras bug. Then you need to swap the input data axis to fit the data into channel last. Make sure your keras configure be channel last. Then run again.

djones4487169 commented 7 years ago

ive changed this:

inputs = Input(shape=(3,227,227)) to:

inputs = Input(shape=(227,227,3)) so it is consistent with tensorflow channel_last but still same problem of it hanging. Any other ideas? Its a shame I think I'm almost there!

wentaozhu commented 7 years ago

I recommend you to use theano because you are running in windows.

djones4487169 commented 7 years ago

What do I need to change to run theano? I think ive tried this in the past and got some issues with 64-bit not installed?

On 3 June 2017 at 21:38, Wentao Zhu notifications@github.com wrote:

I recommend you to use theano because you are running in windows.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/wentaozhu/deep-mil-for-whole-mammogram-classification/issues/2#issuecomment-306000059, or mute the thread https://github.com/notifications/unsubscribe-auth/ATg8ZI-76rIThYbtm3xSPwkmQnTdYw7Rks5sAcRDgaJpZM4NmXAq .

wentaozhu commented 7 years ago

Tensorflow does not support windows well. I think maybe it is the reason.

djones4487169 commented 7 years ago

I get the following error (see attached).

Any ideas?

On 3 June 2017 at 21:56, Wentao Zhu notifications@github.com wrote:

Tensorflow does not support windows well. I think maybe it is the reason.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/wentaozhu/deep-mil-for-whole-mammogram-classification/issues/2#issuecomment-306000943, or mute the thread https://github.com/notifications/unsubscribe-auth/ATg8ZDmFvsnxgAVWHNqocB1HB6vvx_vtks5sAciQgaJpZM4NmXAq .

djones4487169 commented 7 years ago

Hi Wentao,

Just a quick one:

How can I look at one of the train o test images? Can you do this with a pickle file or do we access the images elsewhere? I just wnt to see what they look like before going into the neural network.

Regards

David

On 3 June 2017 at 21:56, Wentao Zhu notifications@github.com wrote:

Tensorflow does not support windows well. I think maybe it is the reason.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/wentaozhu/deep-mil-for-whole-mammogram-classification/issues/2#issuecomment-306000943, or mute the thread https://github.com/notifications/unsubscribe-auth/ATg8ZDmFvsnxgAVWHNqocB1HB6vvx_vtks5sAciQgaJpZM4NmXAq .

wentaozhu commented 7 years ago

Hi David,

If I were you, I will spend a week to learn python first. Good luck!

Best, Wentao

djones4487169 commented 7 years ago

I totally agree Wentao!

On 5 June 2017 at 00:54, Wentao Zhu notifications@github.com wrote:

Hi David,

If I were you, I will spend a week to learn python first. Good luck!

Best, Wentao

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/wentaozhu/deep-mil-for-whole-mammogram-classification/issues/2#issuecomment-306075975, or mute the thread https://github.com/notifications/unsubscribe-auth/ATg8ZO-2DJT9jBYO-JVWeQxvFEW87b51ks5sA0OsgaJpZM4NmXAq .

djones4487169 commented 7 years ago

I finally got the scripts working and running: run_cnn_k_mysparsemil_new.py

djones4487169 commented 7 years ago

Hi Wentao,

I'm running: run_cnn_k_mysparsemil_new.py and getting output (see attached):

  1. Do you recognise the Keras warning?

  2. Is it correct that acc & val_acc are all 0.7561 and other parameters are 0.00 e.g. prec, etc? I have actually ran it for over 50 epochs and still the same pattern?

Also, what is the main difference between mil, mymil and mysparsemil scripts?

Thanks

David

On 5 June 2017 at 00:54, Wentao Zhu notifications@github.com wrote:

Hi David,

If I were you, I will spend a week to learn python first. Good luck!

Best, Wentao

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/wentaozhu/deep-mil-for-whole-mammogram-classification/issues/2#issuecomment-306075975, or mute the thread https://github.com/notifications/unsubscribe-auth/ATg8ZO-2DJT9jBYO-JVWeQxvFEW87b51ks5sA0OsgaJpZM4NmXAq .

wentaozhu commented 7 years ago

You need to debug. Try to run more epochs may be better.

djones4487169 commented 7 years ago

OK - what is the main differences between mil, mymil and mysparsemil scripts? Its hard to see from the code?

wentaozhu commented 7 years ago

mil is the max pooling based mil. Please read the readme file

run_cnn_k_new.py is used for alex net. run_cnn_k_mil_new.py is used for max pooling based deep mil. run_cnn_k_mysparsemil_new.py is used for sparse deep mil. run_cnn_k_mymil_new.py is used for label assignment based deep mil. Here we finetuned weights from max pooling based deep mil.

djones4487169 commented 7 years ago

Hi Wentao,

Can you explain in more detail the following functions in the ImageGenerator please:

featurewise_center=False, # set input mean to 0 over the dataset

samplewise_center=False, # set each sample mean to 0

featurewise_std_normalization=False, # divide inputs by std of the dataset

samplewise_std_normalization=False, # divide each input by its std

zerosquare=True,

zerosquareh=noises,

zerosquarew=noises,

zerosquareintern=0.0

What are noises? Do you think these augmentations make a large difference? Why did you not use the standard image generators but create your own?

David

On 11 June 2017 at 17:33, Wentao Zhu notifications@github.com wrote:

mil is the max pooling based mil. Please read the readme file

run_cnn_k_new.py is used for alex net. run_cnn_k_mil_new.py is used for max pooling based deep mil. run_cnn_k_mysparsemil_new.py is used for sparse deep mil. run_cnn_k_mymil_new.py is used for label assignment based deep mil. Here we finetuned weights from max pooling based deep mil.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/wentaozhu/deep-mil-for-whole-mammogram-classification/issues/2#issuecomment-307640543, or mute the thread https://github.com/notifications/unsubscribe-auth/ATg8ZDPSuwaH5x4a0qX-5gyj8jtGifxaks5sDBbIgaJpZM4NmXAq .

djones4487169 commented 7 years ago

Hi Wentao,

Do you have the code snippets that created the segmentation patches from the paper in Figure 4?

Thanks David

On 11 June 2017 at 18:48, david jones dj4487169@gmail.com wrote:

Hi Wentao,

Can you explain in more detail the following functions in the ImageGenerator please:

featurewise_center=False, # set input mean to 0 over the dataset

samplewise_center=False, # set each sample mean to 0

featurewise_std_normalization=False, # divide inputs by std of the dataset

samplewise_std_normalization=False, # divide each input by its std

zerosquare=True,

zerosquareh=noises,

zerosquarew=noises,

zerosquareintern=0.0

What are noises? Do you think these augmentations make a large difference? Why did you not use the standard image generators but create your own?

David

On 11 June 2017 at 17:33, Wentao Zhu notifications@github.com wrote:

mil is the max pooling based mil. Please read the readme file

run_cnn_k_new.py is used for alex net. run_cnn_k_mil_new.py is used for max pooling based deep mil. run_cnn_k_mysparsemil_new.py is used for sparse deep mil. run_cnn_k_mymil_new.py is used for label assignment based deep mil. Here we finetuned weights from max pooling based deep mil.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/wentaozhu/deep-mil-for-whole-mammogram-classification/issues/2#issuecomment-307640543, or mute the thread https://github.com/notifications/unsubscribe-auth/ATg8ZDPSuwaH5x4a0qX-5gyj8jtGifxaks5sDBbIgaJpZM4NmXAq .

c1a1o1 commented 7 years ago

I finally got the scripts working and running: run_cnn_k_mysparsemil_new.py

What do you edit??

wentaozhu commented 7 years ago

What do you mean edit?

c1a1o1 commented 7 years ago

I now get an error when running: run_cnn_k_new.py:

Negative dimension size caused by subtracting 11 from 3 for 'conv_1/convolution' with input shape: [?,3,227,227], [11,11,227,96]

The same error!

wentaozhu commented 7 years ago

You should make sure the input size as [?,3,227,227]. You can easily do this by resizing image at the first step.

c1a1o1 commented 7 years ago

Thank you wentao, I do appreciate the quick response.