Closed ludehon closed 5 years ago
By the way, @johnwmillr, is it possible to permanently remove #101 ?
Thanks for the new PR. I don't think I'm able to delete PRs, but this StackOverflow link says you can contact GitHub support and they'll delete the PR if it contains sensitive information.
Hi again @johnwmillr, I was wondering how I could design some tests functions. The problem is the content I get from the Genius API is not fixed and therefore tests can fail in the future. I was suggested by someone I could use a locally stored result from the API, and tests my functions with that, but it imply storing data files for tests purpose. What do you think ?
Thanks for keeping to work on this, @ludehon! Could you post some examples here of the queries you're sending to your get_referents
and get_song_annotations
functions?
Does the content from Genius change because users upload new annotations? If a list is returned for a particular song, would the first (or last?) element of the list always be the same?
If there is date metadata associated with annotations, can you sort by date, ensuring that the earliest post will always be first?
Hey all! Been interested recently in automating the construction of a corpus based on Genius data, came across this lovely project. This PR caught my eye, so I thought I'd chime in:
@johnwmillr: 1) I wouldn't hold my breath on getting a PR permanently deleted; GH's collaboration primitives are designed to be immutable, and with good reason. However, you can easily check the PR back out (looks like the branch was deleted, but if you recreate it should stick track), perform an interactive rebase to drop any commits with sensitive data, then force push the branch, leaving no record of the mistaken commits on the Github PR page. Here's roughly how I'd go about it:
$ git clone johnwmillr/LyricsGenius && cd LyricsGenius
# Fetch a reference to the original branch, so we can wipe its history.
$ git fetch origin refs/pull/101/head:annotations
# ^ PR# ^ Your original fork's branch name
$ git rebase -i HEAD~8
# ^ This is simply saying that we're going to rewrite history
# going back 8 commits, in this case. Adjust as you see fit...
# doesn't much matter since you're wiping history anyway, so long
# as you get everything.
Drop any/all commits you don't want public, then git push -f <your-remote>
should do the trick. Let me know if you'd like clarification on any of those steps.
2) There are plenty of great tools for doing exactly what you're looking for β the problem of "not-fixed test data" is a tale as old as QA engineering :P Many people's approach to simplify this is by taking the variability of the API out of the equation β e.g., by making an initial request during testing, automatically recording the request params + response, serializing that to disk, and replaying it back on subsequent test runs. That way you can focus on whether there's a regression in your logic when a test fails, rather than wondering about API flakiness, etc. I've used tools like this in a variety of languages, but not Python, although https://github.com/kevin1024/vcrpy seems to be in line with what you're looking for.
Wow, thank you for this, @jessestuart! These comments are a huge help. I'll try to get around to integrating all of this in the next few days.
@johnwmillr Sorry for the late response, here is an example of the functions :
To access song's annotations (you need the song's id) :
song_id = 873
g = Genius("access_token")
annotations = g.get_song_annotations(song_id)
for a in annotations:
print(a)
Does the content from Genius change because users upload new annotations?
In my opinion, since almost anyone can add annotations, any result content or length can change. And annotations can be added to any fragment of the song, but it seems they may be appended to the end of the annotations list (it is not specified by Genius).
If there is date metadata associated with annotations, can you sort by date, ensuring that the earliest post will always be first?
I searched in the returned JSON and only found two timestamps at referent["annotatable"]["client_timestamps"] For example :
referents = g.get_referents(song_id)
print(referents["referents"][0]["annotatable"]["client_timestamps"])
However I don't know how to use these timestamps : {'updated_by_human_at': 1556991479, 'lyrics_updated_at': 1560888793} (didn't find infos on it).
Afaik there is no way to sort by date.
Perhaps we can test the first annotation of a famous song that will not likely to be changed.
@ludehon, thanks for putting this together! I've cleaned up the methods and added tests with what you've put together. I'm relying on consistent responses from the API for the tests and am happy with that method until it breaks... π Going forward, I'd love to have tests using the vcrpy
module like @jessestuart suggested.
@johnwmillr I was kinda busy lately, I will have more free time now. I will look into vcrpy
for consistent tests !
Should I delete this fork and create an other one for tests ? I'm not sure about what to do...
Yes, your code will be more consistent if you create pull in changes from the master
branch. You shouldn't have to create a new fork to do this, but that may be the easiest route. It'd be great if you could work on the vcrpy
tests!
Hi again @johnwmillr, I looked at vrcpy
and tested the vcr decorator for consistent tests. It seems to work well since it stores cache data in files (JSON response).
I pushed changes on my fork, tell me if it's ok.
Yes, this looks like a good step forward. Are the VCR cache files stored locally? They're copies of the network traffic?
By using @vcr.use_cassette()
without filename in parameter, the file is stored locally next to the script that calls the decorated function.
For example, when testing @vcr.use_cassette() def test_get_song_annotations(self):
the file _/tests/test_get_songannotations is produced, and contains the HTTP GET data received (see attached file, I had to change extension because of github).
@ludehon, you can go ahead and open a PR that adds vcrpy
functionality.
get_song_annotations() can be used for getting all song's annotations from a song id. It returns a list of tuple : (fragment, list_of_annotations). Each fragment correspond to a highlighted part of the song, where people can write annotations. Resolve #100