jqlang / jq

Command-line JSON processor
https://jqlang.github.io/jq/
Other
30.03k stars 1.55k forks source link

Is it possible to add a random selector? #677

Open ashleycoker opened 9 years ago

ashleycoker commented 9 years ago

Take this as an example:

curl 'https://query.yahooapis.com/v1/public/yql?q=SELECT%20*%20FROM%20xml%20WHERE%20url%3D%22http%3A%2F%2Fopenmenu.com%2Fmenu%2F21573c8e-15bb-11e0-b40e-0018512e6b26%22&format=json' | jq '[.query.results.omf.menus.menu.menu_groups.menu_group[].menu_items.menu_item[] | {title: .menu_item_name, price: .menu_item_price, desc: .menu_item_description, type: "food"}]'

I would like to be able to add another field to the resulting object "related" which is a random number of random selections of the other titles. For example a single object in the resulting array may look like:

{
   "title": "FRESH SNOWPEAS",
   "price": "123",
   "desc": "a desc",
   "type": "food",
   "related": [{"title": "SAUTEED SPINACH"}, {"title": "CRIMINI MUSHROOMS"}]
}

The related field must not contain itself, and must contain between say 1 and 5 other items.

I have read ALL the docs and cant find any way to select at random? Surely it must be possible? Thanks in advance

pkoppstein commented 9 years ago

@ashleycoker asked:

Surely it must be possible?

I have also asked for a PRNG, and it is likely to happen (eventually), but in the meantime, you might like to use an existing PRNG for jq (see http://rosettacode.org/wiki/Linear_congruential_generator#jq), or write your own. Perhaps http://rosettacode.org/mw/index.php?title=Van_der_Corput_sequence#jq may also be of interest.

joelpurra commented 9 years ago

@pkoppstein wrote:

I have also asked for a PRNG, and it is likely to happen (eventually), but in the meantime, you might like to use an existing PRNG for jq (see http://rosettacode.org/wiki/Linear_congruential_generator#jq), or write your own. Perhaps http://rosettacode.org/mw/index.php?title=Van_der_Corput_sequence#jq may also be of interest.

Very nice! Until there is a proper internal implementation, would you mind jqnpm generate-ing packages for these algorithms? Otherwise I could package them.

pkoppstein commented 9 years ago

@joelpurra wrote:

Otherwise I could package them.

Please feel free to do so. I was wondering whether it would be possible/easy/sensible to arrange for these snippets from rosettacode.org to be grouped together under "rosettacode" (or just "rc").

Rosettacode.org has the GNU Free Documentation License 1.2. I am a bit fuzzy on what that means for executable code. Perhaps some form of dual-licensing would be possible? Anyway, I am the author of these particular snippets in case that helps.

ashleycoker commented 9 years ago

Forgive me as I am new to this. I managed to generate the psuedo-random numbers from http://rosettacode.org/wiki/Linear_congruential_generator#jq but I still cannot fathom how I might generate the data structure I require from the example curl input shown.

I can generate a list of what appear to be random numbers - fine. But my input has 42 items. How can I select one at random?

How can I then select randomly between 1-5 random unique items from the same list for the 'related' array of each item?

ashleycoker commented 9 years ago

any ideas on this one? Thanks

pkoppstein commented 9 years ago

The following illustrates how, within the confines of jq version 1.4:

Usage: jq -c --arg urandom $(head -c $N /dev/urandom | tr -cd '0-9') -f select_at_random.jq

The value of $N in the line above must be chosen carefully. For generating m random numbers in [0 .. n], 300 * m * log10(n) should suffice.

Example: jq -c --arg urandom $(head -c 1000 /dev/urandom | tr -cd '0-9') -f select_at_random.jq

# Input: [_, seed]
# Output: [ selection, newseed ]
# Select a number in range(0;n) at random
def at_random(n):
  n as $n
  | ($n - 1 | tostring | length) as $count
  | .[1] as $ix
  | if ($ix + $count) > ($urandom|length) then error( "insufficient entropy" )
    else ($urandom[$ix:$ix+$count] | tonumber) as $trial
    | if $trial >= n then [0, $ix+1] | at_random(n)
      else [ $trial, ($ix + $count) ]
      end
    end;

# Select m without replacement from the input array.
# Input: [ inputarray, seed ] or [[inputarray]]
# Output: [ selected, newseed ]
def at_random_without_replacement(m):
  # input/output: [ selected, remaining, seed ]
  def arwr(m):
    if m == 0 then .
    else .[0] as $selected
    | .[1] as $remaining
    | .[2] as $ix
    | ($remaining|length) as $n
    | [0, $ix] | at_random($n)
    | .[0] as $j
    | [ $selected + [$remaining[$j]],
        $remaining[0 : $j] + $remaining[$j+1 :],
    .[1] ] | arwr(m-1)
    end;

  if m > (.[0]|length) then error("The given array does not have \(m) items")
  else [[], .[0], (.[1]//0)] | arwr(m)
  end;

Example 1: to generate 20 random numbers in [0 ... 20] inclusive, with replacement, and discarding the state of the PRNG:

reduce range(0;20) as $i ([[],0];
   at_random(20) as $next | [ .[0] + [$next[0]], $next[1] ]) | .[0]

Example 2: to select a random permutation of range(0;20), discarding the state of the PRNG:

[[range(0;20)]] | at_random_without_replacement(20) | .[0]