Closed bassel-zeidan closed 7 years ago
Hi @bassel-zeidan.
I have a couple of comments.
Having a default bucket_name is inconsistent with the other bluemix/softlayer objects. But since this lib is probably 100% used only in DSX, it's probably okay.
One thing though, NOT related to this PR, but something I noticed about the class is the "cos_id". There are two things with this that I'm not sure about.
Also, you should update the README to include the configuration name.
cos_configuration_name = 'some_configID'
cos = ibmos2spark.CloudObjectStorage(sc, credentials, cos_configuration_name)
I have another comment about the comments here: https://github.com/ibm-watson-data-lab/ibmos2spark/blob/341544978ca89357d213cfdf0de4c319fe3cc360/python/ibmos2spark/osconfig.py#L189-L203. But I don't have time right now to say what I want to say. I'll add another comment to this PR tomorrow.
Hi @gadamc,
Regarding bucket_name i think it is fine since it is passed as an "optional" param just as an optimisation step since most of the time, a user is trying to fetch data from the same bucket in a working session.
Regarding "cos_id" param, i think you have a very good point. I was thinking about this yesterday. I agree the name might be confusing to the consumer of the lib. I will change the name of the param. Although, "name" is not very clear to me. What do you think about "configuration_name" as a name for this param? It is straight to the point.
configuration name sounds good and I agree that name
isn't very clear either.
Maybe we should, probably in a separate PR just to be clear, change the parameter names in the other classes as well: name
-> configuration_name
.
Regarding the DSX documentation here: https://github.com/ibm-watson-data-lab/ibmos2spark/blob/341544978ca89357d213cfdf0de4c319fe3cc360/python/ibmos2spark/osconfig.py#L189-L203.
In principle, I'd like for the library to be as general as possible. I would like it to be useful for those submitting spark-submit jobs, and to compile the library into Java/Scala projects. With that goal in mind, we should remove mention of DSX in the documentation since DSX will be a user of the library and not intertwine them.
However, I DO think it's reasonable to mention how DSX uses the library in the README and provide examples there.
Make sense? What are your thoughts?
@gadamc makes sense. I did some updates on the comments.
Okay... Can you remove this section of text: https://github.com/ibm-watson-data-lab/ibmos2spark/blob/30526a1c2fee04a5321064069f2063e3e4272c71/python/ibmos2spark/osconfig.py#L189-L191
And, while we're at it, in a separate commit, could you remove https://github.com/ibm-watson-data-lab/ibmos2spark/blob/30526a1c2fee04a5321064069f2063e3e4272c71/python/ibmos2spark/osconfig.py#L135-L139
That information is good though and should probably be in the README, perhaps with a picture to show the 'Insert' dropdown.
Once you're done with the above comment remove, and added something to the README if you have it, then +1 to merge.
Test: https://apsportal.ibm.com/analytics/notebooks/06ff8a28-ec0f-44cc-a60c-c01c754c2bac/view?access_token=f42ec46ec715a74839316211c5dad5b9aa496bcdc064e055de149d58505170a9