Closed kwap closed 2 years ago
Use jq to filter the JSON output.
Would you mind sharing a short code sample on how to approach this? Of course not doing it for me, just a little hint on how to do it?
@kwap
Use
jq
to filter the JSON output.Would you mind sharing a short code sample on how to approach this? Of course not doing it for me, just a little hint on how to do it?
It depends what you're trying to do. From my skimming of your OP, I didn't see a clear description of what output you're seeking or would find useful.
Plus, the JSON output is different between videos, playlists, & channels.
jq
can do a whole lot more than merely filtering which elements are included / excluded. It can reprocess the output.
For (a simple) example, converting duration
(which is an integer count of seconds) into the more familiar HH:MM:SS format.
Some of my more elaborate & adventurous filtersets for jq
(in the context of youtube-dl
) output various statistics for a given playlist.
There's plenty of documentation about jq
on its website. I suggest reading it. If you can learn how to use youtube-dl
then jq
shouldn't be a problem, either.
Beyond that, jq
is popular enough that you can search Q&A sites for solutions to specific cases, to use as inspiration.
Unless you mean how to use it as a CLI tool (passing the output of youtube-dl
to jq
), in which case you need a more general guide on using the command-line:
Use jq to filter the JSON output.
Would you mind sharing a short code sample on how to approach this?
Personally I would not do this. After many years, I learned that sometimes its easier just to write an actual program, instead of trying to learn a new command line tool. For example, you could do this:
youtube-dl --id --skip-download --write-info-json LQ3Mu8A7gjY
Then format like this:
package main import ( "encoding/json" "fmt" "os" ) func main() { buf, err := os.ReadFile("LQ3Mu8A7gjY.info.json") if err != nil { panic(err) } var m map[string]interface{} json.Unmarshal(buf, &m) fmt.Println( m["id"], "|", m["title"], "|", m["upload_date"], "|", m["duration"], "|", m["view_count"], "|", m["like_count"], ) }
Result:
LQ3Mu8A7gjY | All of Me (John Legend) - Duranka Perera | 20160327 | 102 | 81006 | 1013
This is brilliant! I'll take it from there. Thank you so much, exactly what I needed to get going, I'll be able to wrap it up myself. Thank you!
Thanks for taking the time to reply. What I don't appreciate though, is the tone of your reply which I found snarky and condescending. Before asking for help I'd spent significant time working on the issue. I would be perfectly fine if my question remained unanswered (nobody is obliged to helping me out).
I was explicit in what I was trying to achieve, right after "the result for each video in file desired_output.txt would like like this:". @89z clearly saw that.
Not mad at you, just wanted to get the facts straight. Cheers
@kwap
Use
jq
to filter the JSON output.Would you mind sharing a short code sample on how to approach this? Of course not doing it for me, just a little hint on how to do it?
It depends what you're trying to do. From my skimming of your OP, I didn't see a clear description of what output you're seeking or would find useful.
Plus, the JSON output is different between videos, playlists, & channels.
jq
can do a whole lot more than merely filtering which elements are included / excluded. It can reprocess the output. For (a simple) example, convertingduration
(which is an integer count of seconds) into the more familiar HH:MM:SS format. Some of my more elaborate & adventurous filtersets forjq
(in the context ofyoutube-dl
) output various statistics for a given playlist.There's plenty of documentation about
jq
on its website. I suggest reading it. If you can learn how to useyoutube-dl
thenjq
shouldn't be a problem, either.Beyond that,
jq
is popular enough that you can search Q&A sites for solutions to specific cases, to use as inspiration.Unless you mean how to use it as a CLI tool (passing the output of
youtube-dl
tojq
), in which case you need a more general guide on using the command-line:
Not to address OP specifically, less experienced programmers may see every task as an opportunity for a new program. The experience of creating those programs can be a big part of learning whichever language and environment. After reaching some level of maturity, the realisation comes that every program is a potential maintenance problem, especially if the chosen platform is unstable, like Go, Rust, .Net, Java/ECMA/Script, or even C++ to some extent. These are the people who prefer POSIX shell scripts (apparently not our hosts, who originally ran GitHub in that way and then bought in some third-party tools), Perl if they have that weird bent, or Python for all the 2 vs 3 vs vs 3.6 vs 3.10 palaver.
From: pointy.haired@megacorp.example.com To: wally@megacorp.example.com Subject: Hello World Please write me a Hello World program. I need it first thing tomorrow.
As yt-dl is being run by Python, that would probably be the natural choice if you were going to write a separate program to process JSON written by yt-dl. Or you could embed yt-dl as a module in your own program, so that the intermediate JSON is never written to a file or pipe. Obviously, the yt-dl codebase is full of examples of JSON processing.
Would you mind sharing a short code sample on how to approach this? Of course not doing it for me, just a little hint on how to do it?
jq is well documented and examples of use are not unknown to major web search engines. Having said that, this
youtube-dl -j -o - 'https://www.youtube.com/c/aliabdaal/videos%5C?view%5C=0%5C&sort%5C=da%5C&flow%5C=grid' 2>&1 | jq 'select(has("formats"))|{_type, ie_key, id, url, title, release_date, description, duration, view_count, like_count, uploader, name, dislike_count, repost_count, average_rating}'
might produce the sort of output desired. To unpack it:
2>&1
makes the POSIX shell redirect the standard error output (where the -j
output goes) to the standard output for piping onward|
is the POSIX pipe operator'...'
expression is a jq filter surrounded by shell quotesselect(...)
picks JSON objects from the input stream matching the ...
expression: in this case, ones with a formats
member|
, here, is jq's pipe operator, passing the selected objects to ...{...}
, the JSON output specification, where each named item is included in the output JSON object with its value from the input object.BTW ... | cat > ...
is a long way, including an extra process, of doing ... > ...
.
{Starts handing out popcorn} 😋
More seriously; some of the commenters may appreciate a read of The Art of Unix Programming (by Eric Raymond, author of Cathedral & Bazaar).
Avoiding the #ProgrammerFight, because there are always pros and cons and judgement calls,
... like Bash versus Zsh, Ksh93, Yash, or even Pdksh to some extent
that's why you write using the language defined in a mature version of POSIX.1 and validate with shellcheck.
there are always pros and cons and judgement calls
That was essentially my own thinking, too.
There are no (absolute, one-true-way) solutions, only trade-offs.
To each (one's|his) own.
Hence, instead of authoring an essay-length analysis of such, I cited that excellent book, which addresses relevant concepts (programming strategy, of a sort; high-level software architectural design), which says it all better than I could.
Or, with PR #30723:
youtube-dl 'https://www.youtube.com/c/aliabdaal/videos%5C?view%5C=0%5C&sort%5C=da%5C&flow%5C=grid' --print '%(id)s | %(title)s | %(description)s | %(urls)s | %(duration)s | %(view_count)s | %(like_count)s | %(uploader)s'
Description
I'm macOS user. (debug output for -v parameter is down below, as it's the same for all of the commands and params I've tried)
When I run the following command in terminal: youtube-dl https://www.youtube.com/c/aliabdaal/videos\?view\=0\&sort\=da\&flow\=grid --skip-download --dump-json -v | cat > videos-too-much-information.txt
I get a large file that contains a lot of information I don't need.
On the other hand, if I run youtube-dl https://www.youtube.com/c/aliabdaal/videos?view=0&sort=da&flow=grid --skip-download --dump-json --flat-playlist | cat > videos-too-little-information.txt for every video in the playlist I get a line in an output file, looking like this:
{"_type": "url", "ie_key": "Youtube", "id": "XcZnSSmeK2I", "url": "XcZnSSmeK2I", "title": "How to prepare for BMAT Section 2 Physics, even if you're not doing it at A-Level | BMAT Tips series", "description": null, "duration": null, "view_count": 31077, "uploader": null}
which does not contain the information I needed.
I tried using -o parameter, to format the output strings, like so: youtube-dl https://www.youtube.com/c/aliabdaal/videos?view=0&sort=da&flow=grid --skip-download --dump-json -o "%(id)s | %(name)s | %(title)s | %(release_date)s | %(duration)s | %(view_count)s | %(like_count)s | %(dislike_count)s | %(repost_count)s | %(average_rating)s | %(comment_count)s"
but that would only result in a large file, too much information and "_filename": "NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA" ... added for each video in the playlist.
Proposition
Introduce additional parameter -jo (json output template) so when I run the command
youtube-dl https://www.youtube.com/c/aliabdaal/videos?view=0&sort=da&flow=grid --skip-download --dump-json -jo "%(id)s | %(name)s | %(title)s | %(release_date)s | %(duration)s | %(view_count)s | %(like_count)s | %(dislike_count)s | %(repost_count)s | %(average_rating)s | %(comment_count)s" | cat > desired_output.txt
then for each video passed as a parameter (or each video from the playlist passed as a parameter) - youtube-dl will attempt getting the values of all the properties specified in json output template. If property is present and has value - it returns this value as a string, if property doesn't have a value it returns empty string, if property doesn't exist it returns null.
So, specifically, for the video from the example with not enough information, should I run the command with new -jo switch and provide the template like above, the result for each video in file desired_output.txt would like like this:
{"_type": "url", "ie_key": "Youtube", "id": "XcZnSSmeK2I", "url": "XcZnSSmeK2I", "title": "How to prepare for BMAT Section 2 Physics, even if you're not doing it at A-Level | BMAT Tips series", "release_date":0170623", "description": "My online BMAT video course (75+ videos) = https://courses.aliabdaal.com/bmat-crash-course-online\n\nToday's video tackles the approach to physics, which is arguably the most feared part of the BMAT, especially given that most medical applicants don't do physics at A-level. I talk about why you shouldn't ignore the physics questions, and give some tips about the order in which to learn stuff from the assumed knowledge guide, and then some tips about how to practice.\n\nUseful Links:\n\nBMAT Ninja - https://bmat.ninja - 1,200+ free questions that you can do online. You can pay \u00a329 for access to the worked solutions written by Oxbridge medical students, or you can apply for one of our bursaries (we give out hundreds of those each year).\n\nOfficial Section 2 Assumed Knowledge Guide - http://www.admissionstestingservice.org/for-test-takers/bmat/preparing-for-bmat/overlay.html\n\nBBC Bitesize - http://www.bbc.co.uk/education/subjects/zpm6fg8", "duration": "287", "view_count": 31077, "like_count":"619", "uploader": "Ali Abdaal", "name":null, "release_date":null, "dislike_count":null, "repost_count":null, "average_rating":""}
[debug] System config: [] [debug] User config: [] [debug] Custom config: [] [debug] Command-line args: ['https://www.youtube.com/c/aliabdaal/videos?view=0&sort=da&flow=grid', '--skip-download', '--dump-json', '-v'] [debug] Encodings: locale UTF-8, fs utf-8, out utf-8, pref UTF-8 [debug] youtube-dl version 2021.12.17 [debug] Git HEAD: 2dc375acc [debug] Python version 3.10.2 (CPython) - macOS-12.2.1-arm64-arm-64bit [debug] exe versions: ffmpeg 5.0, ffprobe 5.0 [debug] Proxy map: {}