Closed Kabouik closed 3 years ago
Hi there - thank you for your question!
I'd have to see if there is a way to "detect" this type of format and output the table the way you describe. Alternatively, I could add a program flag that supports this structure. In the mean time, it's not too difficult to transform this structure into something that jtbl
likes. You can use a tool like jq
or jello
. Here is how you could do it in jello
using Python syntax:
$ cat data.json | jello '\
result = []
for date, location in zip(_.Date, _.Location):
result.append({"date": date, "location": location})
result' | jtbl
date location
------------------- -----------------------------------------
2021-08-24 21:00:00 Indonesia - TERNATE/BABULLAH
2021-08-24 21:00:00 Bolivia - TRINIDAD
2021-08-24 21:00:00 United States - PENSACOLA, FL
2021-08-24 21:00:00 Russian Federation - IVDEL'
Thanks for the quick answer @kellyjonbrazil! I am no expert at all so I am not even sure that this json format is optimal, but I was told that at least it is correct. In fact, I would be happy to change it if someone more educated about json thinks it can be improved for the type of data I'm dealing with.
I'll monitor this issue closely in case you can implement a flag for this type of data (if I don't change my json structure first), but until then the trick you shared will just do! It is not straightforward to integrate it in the wrapper bash script I am working on to scrap and manage the data, though. I would like to avoid having to use an extra script file just for showing tables, so if I could do that just in bash, it would be easier for me.
There is no one right way to do it, but if you want the data to work with jtbl
natively, it should look like this:
[
{
"date": "2021-08-24 21:00:00",
"location": "Indonesia - TERNATE/BABULLAH"
},
{
"date": "2021-08-24 21:00:00",
"location": "Bolivia - TRINIDAD"
},
{
"date": "2021-08-24 21:00:00",
"location": "United States - PENSACOLA, FL"
},
...
]
The jello
JSON processing above is converting the original format into this format. This format is more descriptive than your original format because it pairs the associated data together, so no one needs to guess that the date and locations are related.
You can drop the jello
query directly into your current pipeline. There is no need for another file. I was just using cat data.json
as a placeholder for your existing pipeline. Simply insert the jello
part between your pipeline and jtbl
with another pipe:
$ <existing bash commands> | jello '\
result = []
for date, location in zip(_.Date, _.Location):
result.append({"date": date, "location": location})
result' | jtbl
I was taking a look at your Bash script and it looks like you would like to add the table functionality here:
print() {
cod=$(echo "$1" | tr '[:lower:]' '[:upper:]')
jello < "$WE_DIR"/data/"$cod".json
}
table() {
cod=$(echo "$1" | tr '[:lower:]' '[:upper:]')
#cat "$WE_DIR"/data/"$cod".json | jello '\
#result = []
#for d, t, x in zip(_.Date, _.Title, _.Details):
# result.append({"Date": d, "Title: t, "Details": x})
#result' | jtbl -n
printf "Table output not implemented yet. :("
}
If your print
function works ok, then it seems you could do something like this:
table() {
cod=$(echo "$1" | tr '[:lower:]' '[:upper:]')
jello '\
result = []
for d, t, x in zip(_.Date, _.Title, _.Details):
result.append({"Date": d, "Title: t, "Details": x})
result' < "$WE_DIR"/data/"$cod".json | jtbl -n
}
Note, you cannot indent the jello
lines because the formatting is important to the Python interpreter.
I don't intend to turn this issue into a support thread for my own script (though I wouldn't mind the help!) but I'm getting an error with the table()
function you posted. This is something I tried myself after your first post (I remember I tried without the indentation too at the time), but it failed.
With the above table()
, I get this:
$ we -t aat
jello: Query Exception: SyntaxError
invalid syntax (<unknown>, line 4)
result.append({"Date": d, "Title: t, "Details": x})
query: \\nresult = []\nfor d, t, x in zip ... "Title: t, "Details": x})\nresult
data: {'Date': ['2021-08-25 07:54:16'], ... .org/eventList/details/111229/0']}
But the more I think about it, the more I think the format that jtbl
expects natively would indeed be more appropriate for me. Therefore, the best way might actually be to do the jello
conversion directly in my .py
script so that it saves the json file in the proper format. Everything that has to do with fiddling with the Python code concerns me, though!
No worries! Looks like we are just missing a quotation mark for Title
. This should work:
table() {
cod=$(echo "$1" | tr '[:lower:]' '[:upper:]')
jello '\
result = []
for d, t, x in zip(_.Date, _.Title, _.Details):
result.append({"Date": d, "Title": t, "Details": x})
result' < "$WE_DIR"/data/"$cod".json | jtbl -n
}
Nice script!
Nice, now that works! I Integrated it in devel
, but I'm still wondering if it wouldn't be better to just alter the Python script and directly save data in a more appropriate json structure.
Thanks for the kind words. This is merely an experiment, I don't even have a real use for it, but maybe one day if it becomes feature complete and I can ascertain the reliability of the website I scrap. I'm currently trying to add some fzf
magic into devel
to better handle codes and suggest them when none is provided, but I don't see myself progressing much in the actual todo list in the near future since most of it probably depends on the Python part.
I started in Bash and then started learning Python a few years ago with similar projects like yours. To me, understanding both is very liberating - Python is so powerful, but not too hard to learn. Keep it up!
You could actually just drop this part into your python script:
result = []
for d, t, x in zip(myvar["Date"], myvar["Title"], myvar["Details"]):
result.append({"Date": d, "Title": t, "Details": x})
Just change myvar
to whatever variable name that corresponds to the JSON data (more correctly, dictionary) in the script. Then just use result
instead of the original variable.
Notice I just changed the code to use ["Date"]
instead of .Date
. This is because jello
does some fancy stuff behind the scenes to allow dot notation, but this is not a native Python feature for accessing dictionary attributes.
I am getting an error with that but I suppose this is just me failing to properly merge your suggestion into the existing Python code:
$ WE_DIR=(pwd) scripts/./AAT.py 4 changed files devel
Traceback (most recent call last):
File "/home/mathieu/Projects/worldevents/scripts/./AAT.py", line 24, in <module>
for d, l, t, x in zip(json_data["Date"], json_data["Location"], json_data["Title"], json_data["Details"]):
TypeError: string indices must be integers
…
json_data = json.dumps({"Date": AAT, "Location": locs, "Title": titles, "Details": details})
result = []
for d, l, t, x in zip(json_data["Date"], json_data["Location"], json_data["Title"], json_data["Details"]):
result.append({"Date": d, "Location": l, "Title": t, "Details": x})
print(result, file=open(wedir+'/data/AAT.json', 'a'))
print(f'Appended {len(AAT)} event(s) to {wedir}/data/AAT.json. \033[32;1m✔\033[0m')
As I'm not sure about the difference between the json variable and a json dictionnary, I also tried replacing json_data
with AAT
, locs
, titles
and details
, respectively, in the for d, l, t, x
line, but no dice. I need to read about what those commands do and expect.
Ah yes, it is a bit confusing between JSON and a Python Dictionary at first. Basically a Python Dictionary is the data structure and you can load JSON directly into a Dictionary or dump a Dictionary to a JSON string.
In this case it is not working because json_data
is a string, not a Dictionary. This is because of the json.dumps
function.
To make this work you would do this instead:
json_data = {"Date": AAT, "Location": locs, "Title": titles, "Details": details}
result = []
for d, l, t, x in zip(json_data["Date"], json_data["Location"], json_data["Title"], json_data["Details"]):
result.append({"Date": d, "Location": l, "Title": t, "Details": x})
result = json.dumps(result)
print(result, file=open(wedir+'/data/AAT.json', 'a'))
print(f'Appended {len(AAT)} event(s) to {wedir}/data/AAT.json. \033[32;1m✔\033[0m')
This way json_data
is just a Dictionary and then we finally convert result
to a JSON string so it can be printed.
I didn't rename the variables just to keep things consistent, but in this case it might make sense to rename them because json_data
is no longer JSON, it is a Dictionary. So something simple like data
might be better. Then you could change the final result
name to something like json_data
, since it is a JSON string.
Note, even this is probably not the most efficient way to do this since we are creating a dictionary and reformatting it, but it gets the job done. Without getting too deep into it, you could probably just generate the correct data structure in the first place with something like:
result = []
for d, l, t, x in zip(AAT, locs, titles, details):
result.append({"Date": d, "Location": l, "Title": t, "Details": x})
This way you can get rid of json_data
completely. Sorry, sometimes it takes a few iterations for me to see how to make it more efficient! :)
Awesome! Now the json file indeed is much easier to read, and I can simplify the bash script:
print() {
if [ -z "$1" ]; then
cod="$(fzf < "$WE_DIR"/setup/codes.txt)"
jello < "$WE_DIR"/data/"$cod".json
else
cod=$(echo "$1" | tr '[:lower:]' '[:upper:]')
jello < "$WE_DIR"/data/"$cod".json
fi
}
table() {
print "$1" | jtbl -n | less -S
}
This is getting exciting! I need to focus on better implementing fzf
(handle multi selections, offer only available data and not the full list, possibly prompt for scraping if not already done) and adding human-readable categories, but on the Python side I think my main issue will be when repeating requests multiple times: this appends the new result to the existing json file, but won't check for duplicates. I'll check your repositories, I see that there are a lot of json manipulation tools and maybe there's already something to deal with this issue.
I now have something almost functional but noticed something weird with jtbl -t
:
WTR.json
--------
latitude longitude time title location details source category description
---------- ----------- ------------ ------------ ------------ ------------ ------------ ------------ -------------
-35.015 -55.3376 2021-08-30 1 Uruguay - EV Uruguay, Sou https://rsoe http://www.m Traffic inci Container sh
-4.46859 -74.124 2021-08-30 0 Peru - Over Peru, South https://rsoe https://www. Traffic inci More than 20
38.8853 1.43524 2021-08-29 1 Spain - Fift Spain, Europ https://rsoe https://www. Traffic inci Fifteen peop
AAT.json
--------
latitude longitude time title location details source category description
---------- ----------- ------------ ------------ ------------ ------------ ------------ ------------ -------------
30.2762 -89.7816 2021-08-31 1 United State United State https://rsoe https://eu.u Biological o Hurricane Id
Notice how the columns are not ordered in the same way as in the raw json files:
WTR.json
--------
{
"time": "2021-08-30 14:39:50",
"title": "Uruguay - EVER container ship accident in Rio de la Plata, Uruguay",
"location": "Uruguay, South America",
"details": "https://rsoe-edis.org/eventList/details/113189/0",
"source": "http://www.maritimebulletin.net/2021/08/30/ever-container-ship-accident-in-rio-de-la-plata-uruguay/",
"category": "Traffic incident - Water accident",
"latitude": "-35.015046",
"longitude": "-55.337593",
"description": "Blah blah."
}
{
"time": "2021-08-30 09:22:42",
"title": "Peru - Over 20 dead, dozens missing after vessel collision in Peru",
"location": "Peru, South America",
"details": "https://rsoe-edis.org/eventList/details/113112/0",
"source": "https://www.bignewsnetwork.com/news/270936218/over-20-dead-dozens-missing-after-vessel-collision-in-peru?utm_source=feeds.bignewsnetwork.com&utm_medium=referral",
"category": "Traffic incident - Water accident",
"latitude": "-4.468586",
"longitude": "-74.12399",
"description": "Blah blah."
}
{
"time": "2021-08-29 10:29:21",
"title": "Spain - Fifteen injured in Ibiza ferry accident",
"location": "Spain, Europe",
"details": "https://rsoe-edis.org/eventList/details/112817/0",
"source": "https://www.majorcadailybulletin.com/news/local/2021/08/29/88771/ibiza-ferry-accident-leaves-fifteen-injured.html",
"category": "Traffic incident - Water accident",
"latitude": "38.88534",
"longitude": "1.435239",
"description": "Blah blah."
}
AAT.json
--------
{
"time": "2021-08-31 10:29:55",
"title": "United States - Man attacked by alligator in flooded Louisiana waters after Hurricane Ida",
"location": "United States, North America",
"details": "https://rsoe-edis.org/eventList/details/113530/0",
"source": "https://eu.usatoday.com/story/news/nation/2021/08/30/hurricane-ida-man-attacked-alligator-flooded-louisiana-waters/5660363001/",
"category": "Biological origin - Animal attack",
"latitude": "30.27621",
"longitude": "-89.78162",
"description": "Blah blah."
}
Any idea what may cause this? The tables are generated this way (${typ}
is an array of files selected in fzf
):
for i in "${typ[@]}"
do printf '\n%s\n--------\n' "$i"
jtbl -t < "$i"
done
I also noticed that cat {} | jtbl -t
as fzf
preview command does not make the order of columns consistent, despite the raw json files all being structured the same way. For some files, columns are in the same order as variables in the json
files, sometimes they are mixed like above. Printing those same files that show differently with the fzf
preview in all cases yields the mixed columns above, so there must be something different between cat {} | jtbl -t
and jtbl -t < "$i"
(where i
is looped through the array), although they both sort columns in differently than the json
file.
Yes, it is true that jtbl
doesn't attempt to keep the ordering of columns while it tries to truncate or resize them to fit. I believe (but not sure) the columns will stay the same when using the -n
option, which skips the column resizing logic.
I'd have to dig in a little further to see if there is a way to preserve ordering, but it would take me a while to understand the resizing code as I did that a while back and it was a bit complex. :)
Also, note that field ordering has no intrinsic importance in JSON. An API may change the order of fields at any time and even fields between records may not be in the same order, so it is hard to define what the behavior should be. I suppose jtbl
could see the ordering of the first object and then keep everything the same as that.
Looking at the data it looks like the general order of columns is smallest to longest.
That is, jtbl
checks all of the rows (including the header) and finds the largest cell for each column and basically truncates/resizes them and prints them out in the order of which column has the least-longest cell to the column with the largest cell.
Hope that makes sense! :)
Thanks. That's exactly what I thought after looking into it again after my post, with the "description" column always being the last and "latitude"/"longitude" often being first. I also saw that jtbl -n
doesn't show the same reordering. I guess jtbl -t
may process columns differently when inside a fzf
preview and when having the full terminal width available, this would explain the bottom of my previous post.
Do you plan on adding an option to preserve the original order in future updates (simple question, no expectations)?
On 2021-09-01 19:34 Kelly Brazil @.***> wrote:
looking at the data it looks like the general order of columns is smallest to longest.
That is,
jtbl
checks all of the rows and finds the largest cell for each column and basically truncates/resizes them and prints them out in the order of which column has the least-longest cell to the column with the largest cell.Hope that makes sense! :)
--
You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: https://github.com/kellyjonbrazil/jtbl/issues/6#issuecomment-910502246
I opened a feature request to investigate how to do this. (#7)
Looks like I have the column reordering issue fixed in the dev
branch. Continuing to test, but I should be able to release this fix in the next version shortly.
% echo '{
"time": "2021-08-30 14:39:50",
"title": "Uruguay - EVER container ship accident in Rio de la Plata, Uruguay",
"location": "Uruguay, South America",
"details": "https://rsoe-edis.org/eventList/details/113189/0",
"source": "http://www.maritimebulletin.net/2021/08/30/ever-container-ship-accident-in-rio-de-la-plata-uruguay/",
"category": "Traffic incident - Water accident",
"latitude": "-35.015046",
"longitude": "-55.337593",
"description": "Blah blah."
}
{
"time": "2021-08-30 09:22:42",
"title": "Peru - Over 20 dead, dozens missing after vessel collision in Peru",
"location": "Peru, South America",
"details": "https://rsoe-edis.org/eventList/details/113112/0",
"source": "https://www.bignewsnetwork.com/news/270936218/over-20-dead-dozens-missing-after-vessel-collision-in-peru?utm_source=feeds.bignewsnetwork.com&utm_medium=referral",
"category": "Traffic incident - Water accident",
"latitude": "-4.468586",
"longitude": "-74.12399",
"description": "Blah blah."
}
{
"time": "2021-08-29 10:29:21",
"title": "Spain - Fifteen injured in Ibiza ferry accident",
"location": "Spain, Europe",
"details": "https://rsoe-edis.org/eventList/details/112817/0",
"source": "https://www.majorcadailybulletin.com/news/local/2021/08/29/88771/ibiza-ferry-accident-leaves-fifteen-injured.html",
"category": "Traffic incident - Water accident",
"latitude": "38.88534",
"longitude": "1.435239",
"description": "Blah blah."}' | jq -c | jtbl
╒══════════════════╤══════════════════╤══════════════════╤══════════════════╤══════════════════╤══════════════════╤════════════╤═════════════╤═══════════════╕
│ time │ title │ location │ details │ source │ category │ latitude │ longitude │ description │
╞══════════════════╪══════════════════╪══════════════════╪══════════════════╪══════════════════╪══════════════════╪════════════╪═════════════╪═══════════════╡
│ 2021-08-30 14:39 │ Uruguay - EVER c │ Uruguay, South A │ https://rsoe-edi │ http://www.marit │ Traffic incident │ -35.015 │ -55.3376 │ Blah blah. │
│ :50 │ ontainer ship ac │ merica │ s.org/eventList/ │ imebulletin.net/ │ - Water acciden │ │ │ │
│ │ cident in Rio de │ │ details/113189/0 │ 2021/08/30/ever- │ t │ │ │ │
│ │ la Plata, Urugu │ │ │ container-ship-a │ │ │ │ │
│ │ ay │ │ │ ccident-in-rio-d │ │ │ │ │
│ │ │ │ │ e-la-plata-urugu │ │ │ │ │
│ │ │ │ │ ay/ │ │ │ │ │
├──────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼────────────┼─────────────┼───────────────┤
│ 2021-08-30 09:22 │ Peru - Over 20 d │ Peru, South Amer │ https://rsoe-edi │ https://www.bign │ Traffic incident │ -4.46859 │ -74.124 │ Blah blah. │
│ :42 │ ead, dozens miss │ ica │ s.org/eventList/ │ ewsnetwork.com/n │ - Water acciden │ │ │ │
│ │ ing after vessel │ │ details/113112/0 │ ews/270936218/ov │ t │ │ │ │
│ │ collision in Pe │ │ │ er-20-dead-dozen │ │ │ │ │
│ │ ru │ │ │ s-missing-after- │ │ │ │ │
│ │ │ │ │ vessel-collision │ │ │ │ │
│ │ │ │ │ -in-peru?utm_sou │ │ │ │ │
│ │ │ │ │ rce=feeds.bignew │ │ │ │ │
│ │ │ │ │ snetwork.com&utm │ │ │ │ │
│ │ │ │ │ _medium=referral │ │ │ │ │
├──────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼────────────┼─────────────┼───────────────┤
│ 2021-08-29 10:29 │ Spain - Fifteen │ Spain, Europe │ https://rsoe-edi │ https://www.majo │ Traffic incident │ 38.8853 │ 1.43524 │ Blah blah. │
│ :21 │ injured in Ibiza │ │ s.org/eventList/ │ rcadailybulletin │ - Water acciden │ │ │ │
│ │ ferry accident │ │ details/112817/0 │ .com/news/local/ │ t │ │ │ │
│ │ │ │ │ 2021/08/29/88771 │ │ │ │ │
│ │ │ │ │ /ibiza-ferry-acc │ │ │ │ │
│ │ │ │ │ ident-leaves-fif │ │ │ │ │
│ │ │ │ │ teen-injured.htm │ │ │ │ │
│ │ │ │ │ l │ │ │ │ │
╘══════════════════╧══════════════════╧══════════════════╧══════════════════╧══════════════════╧══════════════════╧════════════╧═════════════╧═══════════════╛
Thanks for this tool! Would there be a way to make
jtbl
recognize that values separated by commas within square brackets should be on new lines, withjson
files of the following structure?Currently, when I pipe that into
jtbl
, it just shows a table with two columns and a single row containing all values separated by commas.