Running taxi results inconsistent

mao-ouyang commented 5 years ago

I use the taxi-small data, the result of running python spark cpu_main.py and the result of running the modified ETL script again. The previous figure is the demo data used, and the later is the ETL processed data. Why is this happening?

chuanlihao commented 5 years ago

Hi @mao-ouyang,

As far as I understand, two results will be approximate when running with the same dataset and parameters.

I tested with the sample taxi-small.tar.gz and the default parameters, the result is ~2.50. (To make it simple, I didn't use the trainWithEval option.) Then I tested again with another taxi-small-2012-11.zip, the result is ~2.65. I made that dataset after fixing issue 58 locally. I think the difference is acceptable because I was testing with two different small datasets.

For your inconsistent issue, please either share your sample datase after ETL, or test again with taxi-small-2012-11.zip. Then we will be able to analyze the results based on the same dataset.

anfeng commented 5 years ago

Assuming the issue was resolved per latest suggestion. Feel free to reopen if you still have the problem

rapidsai / spark-examples

Running taxi results inconsistent #59