jqlang / jq

Command-line JSON processor
https://jqlang.github.io/jq/
Other
29.59k stars 1.54k forks source link

Create Multi Smaller Files From a Big Json File #3121

Closed washere closed 1 month ago

washere commented 1 month ago

If a JSON file format is like this, example of first 2 notes:


[
   {
      "_id": 1,
      "title": "First Note",
      "note": "Blurb blurb blurb blurb \n blurb blurb blurb \n",
      "category": 6
   },
   {
      "_id": 2,
      "title": "My Second Thought",
      "note": "Blah blah blah \n blah blah \n\n\n",
      "category": 3
   }
]

Can we have each as a separate file, so this would be the first txt file:


      1
      First Note
      Blurb blurb blurb blurb blurb \n\n\n\n
      6

and so on for all notes. Is it possible to do this in jq or am I wasting my time? Thanks.

pkoppstein commented 1 month ago

The bad news is that jq alone is not up to the job.

The good news is that jq was designed to, and does, work well with other command-line tools.

For details, see e.g. https://stackoverflow.com/questions/70569726/jq-split-json-in-several-files

washere commented 1 month ago

Thanks buddy. I was thinking of awk & sed. RegExp groupings, /1 /2 /3, etc will be mighty cool in jq if on roadmap Thanks again. :+1:

pkoppstein commented 1 month ago

@washere wrote:

I was thinking of awk & sed.

jq and awk work very well together for the use-case you mention.

RegExp groupings

Not sure what you're referring to, but please note that jq does support (nested) regex groupings by name, e.g.

jq -n '"January 3rd, 2020" | capture("(?<month>(?<mon>[^ ]{3})[^ ]+) (?<day>[0-9]+)[^, ]*, *(?<year>[0-9]+)")'
{
  "month": "January",
  "mon": "Jan",
  "day": "3",
  "year": "2020"
}

Since these jqlang "issues" pages are mainly for reporting bugs and requesting enhancements, we generally ask that usage questions and the like be posted to https://stackoverflow.com/questions/tagged/jq where you'll likely get timely and useful responses. If you have a specific ER to make, by all means do so; otherwise, please consider closing this "issue".

washere commented 1 month ago

Thanks. I actually already did the task. I deleted blank lines or with just { or } etc in sublime_text. In find/replace fields of sublime, can have carriage_returns, (CTRL+Return).

I deleted 1st line of each note (.db primary key) by regexp grouping (need to click regexp icon next to find/replace fields):

"_id": (.*),\n

.* is whatever

renamed category:

      "category": (.*)

to 

      "TYPE": (/1)

/1 being whatever .*

So ended up with 3 lines per note. Some note lines (2nd line) being 150,000 characters!

Then just used:

split -l 3 mynotes.json Note-

ie: create a new file called Note-xxx from every 3 lines (-l 3). It created about 900 big text files in a few seconds.

It worked great. Writing here so might help someone in future searching closed issues.

Although this can be a label (Support or Feature Request), I agree with you so closing this. Also thanks for the actual support link, would have never found it! Will keep an eye on jq. Thanks again :+1:

washere commented 1 month ago

In case anyone in future needs this:

Because all note content is in a line (single field (cell) of SQLite table (Row for each record) before exportimg to JSON), there will be lots of:

\n \r plus some: \t

2 problems exist, to change them into actual New_Lines (Carriage Returns):

The trick with the latter is to have triple back slashes:


find -type f -exec sed -i 's/\\\n/\\r/g' {} \;
find -type f -exec sed -i 's/\\\r/\\r/g' {} \;
find -type f -exec sed -i 's/\\\t/\\r/g' {} \;

This one gets rid of multiple blank lines, I ran it 5 times, maximum blank lines will be two:

find -type f -exec sed -i 's/\\r\\r\\r/\\r\\r/g' {} \;

ABOVE commands operate on ALL FILES in the directory you're in (pwd). To apply to other folders, you need to add the folder path to above commands. But it's easier to be in the directory where the files are, then just run these commands. Test on multi backup folders, delete and copy back files to test again the look.

Then notes look nice, good luck.

P.S. This code takes first line of each file (note title) & names the file with it:

make sure in terminal you are in the folder where all the files are, saves typing path in commands.

Just paste the whole chunk in a terminal & return to run it:


for file in *
do
   # Avoid renaming diretories!
   if [ -f "$file" ]
   then
       a=`head -1 $file`
       b=`tail -n 1 $file`
       newname="${a} ${b}"
       if [ -f "$newname" ]
       then
              echo "Cannot rename $file to $newname - file already exists"
       else
              mv "$file" $(echo "$newname.txt" | sed -e 's/[^A-Za-z0-9._-]/_/g')
       fi
   fi
done

This line: b=tail -n 1 $file is last line of each file, category, i just append it to filename, you can delete it.

P.P.S.: The find-type...sed commands above to remove back slashes \ in sqlite single line outputs (\n \t etc) and replace with actual new lines (\r) can be erratic & behave differently often. So I suggest this app which can be found in the Gnome Software app store or Mintinstall app store or Discover app store etc or Synaptics package manager. Or even direct download from sourceforge, I recommend using it instead for mass grep & replace on multi files:

https://regexxer.sourceforge.net/

https://mail.gnome.org/archives/gnome-announce-list/2004-July/msg00022.html