Adding html_description attribute to the Cache object

DarkOutcast6 commented 1 year ago

As stated in issue #216, you cannot currently acces the raw HTML version of a description. So, I'm working on adding a html_description attribute to the Cache object. I understand it may not work, but I'm semi-confident I understand Beautiful Soup well enough and added the right code. Bear with me though, as the code could definitely contain mistakes.

FriedrichFroebel commented 1 year ago

There is a typo in your code: It should read description_html instead of description. Additionally, do you see any way to avoid doing the same tree querying twice for both the description and description_html attribute. And please add a corresponding test for this as well.

FriedrichFroebel commented 1 year ago

Thanks for the test. Could you please add a check for the actual parsing from the HTML as well by extending one of the existing cassette-based tests? Additionally, CI is currently failing due to code style violations.

Do you have a solution for this?

Additionally, do you see any way to avoid doing the same tree querying twice for both the description and description_html attribute?

DarkOutcast6 commented 1 year ago

I'm not entirely sure what you mean by the following. (I apologize, as I am a novice with these type of things)

Could you please add a check for the actual parsing from the HTML as well by extending one of the existing cassette-based tests?

I think I understand what you mean by

avoid doing the same tree querying twice for both the description and description_html attribute?

If I do understand you correctly, I'm working on it now.

FriedrichFroebel commented 1 year ago

I'm not entirely sure what you mean by the following. (I apologize, as I am a novice with these type of things)

Could you please add a check for the actual parsing from the HTML as well by extending one of the existing cassette-based tests?

Your current test only covers explicitly setting the description_html field in one case, but does not ensure that the parsed data from the Geocaching.com page is indeed correct. AFAIK this is not tested for the description field either, but checking this makes sense.

I think I understand what you mean by

avoid doing the same tree querying twice for both the description and description_html attribute? If I do understand you correctly, I'm working on it now.

You only fixed this for exactly one case, but not for the remaining ones where the same approach might make sense as well.

Additionally, when you look at the CI runs here, you will see that there currently are code style violations. And https://github.com/tomasbedrich/pycaching/pull/218/files#diff-c1861c7962a2fda2a943282daf8c8cb0030863a5cfdf8acc43402e3f1b381857R157 is considered both a name violation according to PEP8 and invalid variable name syntax.

DarkOutcast6 commented 1 year ago

Sorry for the trouble -- I feel so close :)

Pretty sure I figured out what the code violation is. (correct me if I didn't fix it or if there is more)

You only fixed this for exactly one case, but not for the remaining ones where the same approach might make sense as well.

I looked over the code and can't find another case where this occurs.

Your current test only covers explicitly setting the description_html field in one case, but does not ensure that the parsed data from the Geocaching.com page is indeed correct. AFAIK this is not tested for the description field either, but checking this makes sense.

I understand what you mean, but how would I actually do this?

Sorry for sounding like an idiot through this whole process lol. Hope I'm not too annoying! :wink:

FriedrichFroebel commented 1 year ago

Pretty sure I figured out what the code violation is. (correct me if I didn't fix it or if there is more)

Your variable names should use an underscore, not a dash (this is invalid Python syntax). Additionally, CI is reporting an empty line too much. This needs to be fixed.

I looked over the code and can't find another case where this occurs.

You added changes to https://github.com/tomasbedrich/pycaching/pull/218/files#diff-c1861c7962a2fda2a943282daf8c8cb0030863a5cfdf8acc43402e3f1b381857R832 and https://github.com/tomasbedrich/pycaching/pull/218/files#diff-c1861c7962a2fda2a943282daf8c8cb0030863a5cfdf8acc43402e3f1b381857R958 as well, not only at https://github.com/tomasbedrich/pycaching/pull/218/files#diff-c1861c7962a2fda2a943282daf8c8cb0030863a5cfdf8acc43402e3f1b381857L157 You should use the same approach all over.

I understand what you mean, but how would I actually do this?

There are multiple tests in test.test_cache.TestMethods. Consider extending the tests for the three loader methods to test for description (if currently missing) and description_html, ensuring that the HTML variant includes HTML tags, while the plain-text one does not. https://github.com/tomasbedrich/pycaching/blob/bef4ca07693d4d1e06b070f4c72a154ca3f487a4/test/test_cache.py#L282 is an example of an existing simple description test where the existing assertion probably should be expanded to cover a case where HTML tags have been removed (which is not the case here yet).

DarkOutcast6 commented 1 year ago

What is the difference between self.assertIn(), self.assertEqual() and self.assertGreater() and which of these should be used for description_html?

FriedrichFroebel commented 1 year ago

I recommend you to familiarize yourself with general unittest-based testing (which is the stdlib Python module for testing) to do so. You will find the official documentation for the different assertions at https://docs.python.org/3/library/unittest.html#assert-methods

DarkOutcast6 commented 1 year ago

What else do I need to add for this to be valid? It be very helpful if you could give me a list of things I need to edit.

FriedrichFroebel commented 1 year ago

The code style needs to be fixed. See the CI runs for this or run black in your local development environment to check/fix this.
As mentioned in the second section at https://github.com/tomasbedrich/pycaching/pull/218#issuecomment-1712584664, please consider re-using the retrieved tree for the text and string rendering as you have done for the other case.
As mentioned in the third section at https://github.com/tomasbedrich/pycaching/pull/218#issuecomment-1712584664, please extend the tests for the three loader methods you changed with suitable tests (the test you added in your latest commit would probably fail for this specific geocache instance). To do so, please have a look at the original descriptions of the corresponding geocaches used for testing purposes.
- Add a corresponding assertion for a part of the description_html property, ensuring that it does indeed include the correct HTML tags.
- Improve/rewrite the current assertion for the description property to check for exactly the same section as in description_html with the only difference being that the expected content does indeed not contain the corresponding HTML tags.

DarkOutcast6 commented 1 year ago

The code style needs to be fixed.

I fixed the black code styling issues.

re-using the retrieved tree for the text and string rendering as you have done for the other case.

Made it so that it now re-uses the tree.

The CI runs now state that there is a problem with flake8. The error reads:

./pycaching/cache.py:628:5: F811 redefinition of unused 'description_html' from line 619 @description.setter ^ 1 F811 redefinition of unused 'description_html' from line 619

I don't fully understand what it means. Could you explain what it means?

FriedrichFroebel commented 1 year ago

Thanks for the fixes. I have added two other comments to the diff. Additionally, the required tests are still missing or incomplete and should be implemented as requested previously.

DarkOutcast6 commented 1 year ago

Additionally, the required tests are still missing or incomplete and should be implemented as requested previously.

Working on this now, hopefully will be finished this week! If there is anything else that needs to be brought to my attention, feel free to tell me now!

DarkOutcast6 commented 1 year ago

Add a corresponding assertion for a part of the description_html property, ensuring that it does indeed include the correct HTML tags.

Just fixed it!

Hypothetically, this should work now, correct? If so, is there anything else I need to update? (And if it isn't, I'll update accordingly)

FriedrichFroebel commented 1 year ago

You implemented the new property for the three different loading methods. I can find (at most) only two tests which check that loading the data does indeed work correctly for description_html. We should cover all three cases.

I previously requested that the corresponding tests for the description should be adopted to actually show that they do not contain HTML tags. This is not the case at the moment, thus we cannot see from the tests if description does indeed not contain any HTML tags, as there is nothing to strip for the existing text. Ideally, your test snippet for description and description_html would be identical with only the tags differing, while having HTML tags in-between as well, not only at the borders.

Additionally, linting currently fails due to the long line. Do you really need such a long text for the test or could you shorten this to still cover the relevant cases (see previous paragraph), but not being too verbose?

DarkOutcast6 commented 1 year ago

description should be adopted to actually show that they do not contain HTML tags. This is not the case at the moment, thus we cannot see from the tests if description does indeed not contain any HTML tags, as there is nothing to strip for the existing text. Ideally, your test snippet for description and description_html would be identical with only the tags differing, while having HTML tags in-between as well, not only at the borders.

Pretty sure I just fixed it.

Additionally, linting currently fails due to the long line.

Also pretty sure I just fixed it.

You implemented the new property for the three different loading methods. I can find (at most) only two tests which check that loading the data does indeed work correctly for description_html. We should cover all three cases.

Not really sure what you mean...?

FriedrichFroebel commented 1 year ago

You changed _from_print_page, load_quick and load_by_guid. You might add a new test for _from_print_page (which does not seem to exist at the moment), while extending test.test_cache.TestMethods.test_load and test.test_cache.TestMethods.test_load_by_guid.

DarkOutcast6 commented 1 year ago

Looks like it's working!

By the way, what does mean and is it important?

You changed _from_print_page, load_quick and load_by_guid. You might add a new test for _from_print_page (which does not seem to exist at the moment), while extending test.test_cache.TestMethods.test_load and test.test_cache.TestMethods.test_load_by_guid.

Do I still need to this? If so, what specifically must I change/add?

DarkOutcast6 commented 1 year ago

Thank you so much for the help, @FriedrichFroebel! Do you know when this will be added to a release?

FriedrichFroebel commented 1 year ago

Version 4.4.0 has been released including this new feature.

tomasbedrich / pycaching

Adding html_description attribute to the Cache object #218