newcoder0531 commented 6 years ago

when I excuted "python main.py -- prepare_metadata --train_data --valid_data --test_data" the error message "neptune:Error:No value provided for parameter 'test_data'" appeared. my neptune.YAML ： parameters:

Data Paths

data_dir: D:\torch\open-solution-mapping-challenge\data meta_dir: D:\torch\open-solution-mapping-challenge\data masks_overlayed_prefix:masks_overlayed experiment_dir: D:\torch\open-solution-mapping-challenge

Is there a problem with my settings? Or can you tell me where the ‘test_data,train_data,valid_data‘ are defined? Where can I find the default values for these parameters? Thanks.

apyskir commented 6 years ago

Hi @newcoder0531 ! Thank for your interest in our project! I'm glad that you reported this problem. I think it occurs when you try to run your experiment in offline mode - I didn't run into this error while running neptune run main.py prepare_metadata --train_data --valid_data --test_data. Although I didn't fix the problem, I found a workaround. Please just execute python main.py -- prepare_metadata --train_data --valid_data --test_data -te. Hope it helps! We are of course working on fixing this issue. In case of further problems, just tell us about them. test_data, train_data and valid_data are boolean flags, so default values of the parameters are False, and when you put them into command line, it should be True. It is defined in main.py.

Regards, Andrzej

newcoder0531 commented 6 years ago

Thanks for your answer.@apyskir And now,when I excuted"python main.py -- evaluate_predict --pipeline_name unet_tta --chunk_size 1000", I got a file named"submission.json" after I trained on a small dataset. And I found that the value of "segmentation"[conuts] in the dict hasn't been decoded.

Below is an example:

{'image_id': 0, 'category_id': 100, 'score': 140.166575521481, 'segmentation': {'size': [300, 300], 'counts': '0[910000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000XJ5o1KbM6RM2[5HbM5XMNX5MaM2P3NPM1Q3OoL1Q3OmL5Q3KmL:ZMGZ5OYMk0e2UOVMQ1i2oNSMV1l2jNQMZ1n2fNQM\1n2dNQM^1n2bNQM_1o2aNQM_1o2aNPMa1o2_NPMb1P3^NPMc1o2]NPMd1P3\NPMe1o2 ...(I omitted the middle part) \2VKaMm4_2SKhL1>o4j2PKhL3<m4l2PKgL98g4Q3PKgL<5d4T3PKfL>5b4U3PKeLa03`4X3oJeLi5[3WJdLj5\330000000000000000000000000000000000000000000O100000000000000000000000000000000O1000000000000O1000000000000O10000O1000000O1N2I7K5O100T4'}, 'bbox': [0.0, 0.0, 299.0, 299.0]}

It seems to be due to the code in src/utils.py line 102 annotation['segmentation']['counts'] = annotation['segmentation']['counts'].decode("UTF-8")

The decode function does not seem to decode. Or this is your design?

Finally, I want to use my trained model to segement the test-images and display the result.Where can I get helpful information?

apyskir commented 6 years ago

Hi @newcoder0531 submission.json is produced to be in a format accepted by CrowdAI submission system. If you send it to the servers, they will score it properly. The output actually is decoded, but not into human-readable format. Furthermore, scoring won't work if you don't use the decode() function. As far as displaying the result is concerned, the easiest way is to use COCO interface and showAnns function. You can find some information and examples here:

newcoder0531 commented 6 years ago

Hi @apyskir ,I have a new problem. My computer system is Windows.And the code for step_name in pipeline.all_steps: cmd = 'touch {}'.format(os.path.join(cache_dirpath, 'transformers', step_name)) subprocess.call(cmd, shell=True) in src/callbacks.py , line 203 The 'touch' is a command in Linux,and I want to modif it. But I found the code seems to create some empty files,these files don't seem to have been used. Are these codes necessary?

taraspiotr commented 6 years ago

Hi @newcoder0531 I will try to quickly answer you question. Yes, you are right, we only need to save weights for trainable transformers and others were not necessary. There is already a PR to steppy repo with is_trainable flag and only trainable transformers will be saved and not trainable won't, so that it will be no longer necessary to "create" them with touch command.

newcoder0531 commented 6 years ago

Hi @taraspiotr ,thanks for your answer. Now I am doing something that seems silly.I select 500 images as a training set and 100 images in training set as val set.The test set is same as val set. I want to overfit the network.In fact, it works well on the train step.The average precision of cross-validation can reach 70% in the case of only 10 epochs of training. But when I evaluated with the command python main.py -- evaluate --pipeline_name unet_weighted --chunk_size 20,the average precision is less than 3%. I don't know what caused this big difference. Same input,same network,why are the outputs so different between 'train' and 'evaluate'?

apyskir commented 5 years ago

Hi @newcoder0531 , sorry for the late response. Did you solve the problem? Keep in mind, that evaluate makes its own train_test_split, so make sure you have the same data in both cases. Also postprocessing pipeline used during training is simplified, so during evaluation you use quite a bit more complicated one. Those are ideas that come to my mind.

newcoder0531 commented 5 years ago

Hi @apyskir ,thanks for your response. This problem has not been resolved.And this problem also occurred when I used 280,000 data sets. （For this problem, I changed to the Linux system and reduced the version of torch to 0.3.1.）

But the evaluate results does not seem to affect the accuracy of the predict. This is the result of evalute.(The model trained 13 epochs on the default 280,000 data sets and achieved 70% precision in cross-validation.But only 7% in the evaluate.The val set and test set are same as the CrowdAI dataset)

And below display the submission.json's result after decoded.

I don't know if you have such a problem.I will find some debugging tips to output intermediate results, but not now. If I find the reason, I will tell you in time.

neptune-ai / open-solution-mapping-challenge

No value provided for parameter 'test_data' #159

Data Paths