Closed mao-ouyang closed 5 years ago
Hi @mao-ouyang,
As far as I understand, two results will be approximate when running with the same dataset and parameters.
I tested with the sample taxi-small.tar.gz and the default parameters, the result is ~2.50. (To make it simple, I didn't use the trainWithEval option.) Then I tested again with another taxi-small-2012-11.zip, the result is ~2.65. I made that dataset after fixing issue 58 locally. I think the difference is acceptable because I was testing with two different small datasets.
For your inconsistent issue, please either share your sample datase after ETL, or test again with taxi-small-2012-11.zip. Then we will be able to analyze the results based on the same dataset.
Assuming the issue was resolved per latest suggestion. Feel free to reopen if you still have the problem
I use the taxi-small data, the result of running python spark cpu_main.py and the result of running the modified ETL script again. The previous figure is the demo data used, and the later is the ETL processed data. Why is this happening?