Closed masayuki14 closed 6 years ago
https://hub.docker.com/_/java/ Javaの公式imageをベースにする。
容量足りない、みたいなエラーが出る。
Step 3/10 : RUN apt-get upgrade
---> Running in 21b8b755d7d4
Reading package lists...
Building dependency tree...
Reading state information...
The following packages have been kept back:
openjdk-8-jdk openjdk-8-jdk-headless openjdk-8-jre openjdk-8-jre-headless
The following packages will be upgraded:
base-files bzr ca-certificates curl debconf debconf-i18n
debian-archive-keyring git git-man gnupg gpgv libc-bin libc6 libcups2
libcurl3 libcurl3-gnutls libdb5.3 libexpat1 libffi6 libfreetype6 libgcrypt20
libgdk-pixbuf2.0-0 libgdk-pixbuf2.0-common libgnutls-deb0-28 libgraphite2-3
libgssapi-krb5-2 libgtk2.0-0 libgtk2.0-bin libgtk2.0-common libicu52
libjasper1 libk5crypto3 libkrb5-3 libkrb5support0 liblcms2-2 libldap-2.4-2
libncurses5 libncursesw5 libnss3 libpam-modules libpam-modules-bin libpam0g
librtmp1 libssl1.0.0 libsvn1 libsystemd0 libtasn1-6 libtiff5 libtinfo5
libudev1 libx11-6 libx11-data libx11-dev libx11-doc libx11-xcb1 libxcursor1
libxfixes3 libxi6 libxml2 libxrandr2 libxtst6 login mercurial
mercurial-common multiarch-support ncurses-base ncurses-bin openssh-client
openssl passwd perl perl-base perl-modules python-bzrlib sensible-utils
subversion systemd systemd-sysv tzdata udev unzip wget
82 upgraded, 0 newly installed, 0 to remove and 4 not upgraded.
Need to get 59.3 MB of archives.
After this operation, 2279 kB disk space will be freed.
Do you want to continue? [Y/n] Abort.
The command '/bin/sh -c apt-get upgrade' returned a non-zero code: 1
Docker Preference > Disk > [Resize disk image]
64GB で allocated:17GB くらいなので容量あるはず。 64 -> 96 に変えてみるもへんかなし。
とりあえず既存imgae全部消す。
$ docker images -q | xargs -I_ docker rmi _
消しても変化なし。
RUN apt-get -y upgrade
-y
オプションつけたらうまく行った。
つけなくてもうまくいくときもあるので、ベースイメージ次第かな。
これからやつけるようにしよう。
Docker で embulk 動かせられるので、input json でやってみる。
$ embulk example ./example
実行
seed.yml
を編集
in:
type: file
path_prefix: '/work/./example/json/tripadvisor_'
out:
type: stdout
example/json/
にtripadvisorのJSONデータをうつしておく
guess
コマンド実行で config.yml
つくる
embulk guess example/seed.yml -o config.yml
2018-02-13 02:31:59.138 +0000: Embulk v0.9.2
********************************** INFORMATION **********************************
Join us! Embulk-announce mailing list is up for IMPORTANT announcement such as
compatibility-breaking changes and key feature updates.
https://groups.google.com/forum/#!forum/embulk-announce
*********************************************************************************
2018-02-13 02:32:02.560 +0000 [INFO] (main): Gem's home and path are set by default: "/root/.embulk/lib/gems"
2018-02-13 02:32:03.377 +0000 [INFO] (main): Started Embulk v0.9.2
2018-02-13 02:32:03.443 +0000 [INFO] (0001:guess): Listing local files at directory '/work/example/json' filtering filename by prefix 'tripadvisor_'
2018-02-13 02:32:03.445 +0000 [INFO] (0001:guess): "follow_symlinks" is set false. Note that symbolic links to directories are skipped.
2018-02-13 02:32:03.454 +0000 [INFO] (0001:guess): Loading files [/work/example/json/tripadvisor_uji_things_to_do_20180209.json]
2018-02-13 02:32:03.475 +0000 [INFO] (0001:guess): Try to read 32,768 bytes from input source
2018-02-13 02:32:03.553 +0000 [INFO] (0001:guess): Loaded plugin embulk (0.9.2)
2018-02-13 02:32:03.568 +0000 [INFO] (0001:guess): Loaded plugin embulk (0.9.2)
2018-02-13 02:32:03.588 +0000 [INFO] (0001:guess): Loaded plugin embulk (0.9.2)
2018-02-13 02:32:03.615 +0000 [INFO] (0001:guess): Loaded plugin embulk (0.9.2)
in:
type: file
path_prefix: /work/./example/json/tripadvisor_
parser: {charset: UTF-8, newline: LF}
out: {type: stdout}
Created 'config.yml' file.
type: file
だったからほとんど変わらなかった。
type: json
ならいいのかな。
file: json
にしたらエラー出た。ちゃんと調べよう。
Error: InputPlugin 'json' is not found.
Unknown input plugin 'json'. embulk/input/json.rb is not installed. Run 'embulk gem search -rd embulk-input' command to find plugins.
https://takeshiyako.blogspot.jp/2015/04/embulk-json-google-bigquery.html https://qiita.com/shun0102/items/8989e6ed2ee0f46a0fa9
embulk-parser-jsonl
を使えばいいらしい。
http://www.embulk.org/plugins/ やはり公式をみるべし
FILE PARSER に jsonl
記載がある。
$ embulk gem install embulk-parser-jsonl
$ embulk guess -g jsonl example/seed.yml -o config.yml
2018-02-13 02:45:52.092 +0000: Embulk v0.9.2
********************************** INFORMATION **********************************
Join us! Embulk-announce mailing list is up for IMPORTANT announcement such as
compatibility-breaking changes and key feature updates.
https://groups.google.com/forum/#!forum/embulk-announce
*********************************************************************************
2018-02-13 02:45:54.356 +0000 [INFO] (main): Gem's home and path are set by default: "/root/.embulk/lib/gems"
2018-02-13 02:45:55.125 +0000 [INFO] (main): Started Embulk v0.9.2
2018-02-13 02:45:55.185 +0000 [INFO] (0001:guess): Listing local files at directory '/work/example/json' filtering filename by prefix 'tripadvisor_'
2018-02-13 02:45:55.187 +0000 [INFO] (0001:guess): "follow_symlinks" is set false. Note that symbolic links to directories are skipped.
2018-02-13 02:45:55.201 +0000 [INFO] (0001:guess): Loading files [/work/example/json/tripadvisor_uji_things_to_do_20180209.json]
2018-02-13 02:45:55.225 +0000 [INFO] (0001:guess): Try to read 32,768 bytes from input source
2018-02-13 02:45:55.305 +0000 [INFO] (0001:guess): Loaded plugin embulk (0.9.2)
2018-02-13 02:45:55.330 +0000 [INFO] (0001:guess): Loaded plugin embulk (0.9.2)
2018-02-13 02:45:55.365 +0000 [INFO] (0001:guess): Loaded plugin embulk (0.9.2)
2018-02-13 02:45:55.626 +0000 [INFO] (0001:guess): Loaded plugin embulk (0.9.2)
2018-02-13 02:45:55.679 +0000 [INFO] (0001:guess): Loaded plugin embulk-parser-jsonl (0.2.0)
org.jruby.exceptions.RaiseException: (ParserError) A JSON text must at least contain two octets!
at json.ext.Parser.initialize(json/ext/Parser.java:175)
at json.ext.Parser.new(json/ext/Parser.java:151)
at RUBY.parse(uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/json/common.rb:155)
at RUBY.block in guess_lines(/root/.embulk/lib/gems/gems/embulk-parser-jsonl-0.2.0/lib/embulk/guess/jsonl.rb:18)
at org.jruby.RubyArray.each(org/jruby/RubyArray.java:1735)
at RUBY.guess_lines(/root/.embulk/lib/gems/gems/embulk-parser-jsonl-0.2.0/lib/embulk/guess/jsonl.rb:17)
at RUBY.guess(uri:classloader:/gems/embulk-0.9.2-java/lib/embulk/guess_plugin.rb:121)
at RUBY.guess(uri:classloader:/gems/embulk-0.9.2-java/lib/embulk/guess_plugin.rb:24)
Error: (ParserError) A JSON text must at least contain two octets!
guess
にこだわらなくてもいいか。
josnが行データじゃないからだめかもしらん。
[
{ ... },
{ ... }
]
$ embulk gem install embulk-parser-json
普通のJSONパーサーにしてみる。
https://github.com/takumakanari/embulk-parser-json
config.yml
を Example を参考に自分で書く
in:
type: file
path_prefix: /work/./example/json/tripadvisor_
parser:
type: jsonpath
root: $
stop_on_invalid_record: false
schema:
- { name: detail_url, type: string }
- { name: title, type: string }
- { name: rate, type: string }
- { name: review, type: long }
- { name: part, type: string }
- { name: tags, type: string }
- { name: rating5, type: long }
- { name: rating4, type: long }
- { name: rating3, type: long }
- { name: rating2, type: long }
- { name: rating1, type: long }
- { name: street_address, type: string }
- { name: address_locality, type: string }
- { name: postal_code, type: string }
- { name: place_id_g, type: string }
- { name: place_id_d, type: string }
- { name: lng, type: double }
- { name: lat, type: double }
- { name: images, type: string, path: "images[0]" }
out: {type: stdout}
dry run. うまく行った。
type: integer
, type: int
はだめで、 type: long
にしたらOKだった。
この type
に指定できる型ってJavaのやつなんだろうか。Documentどこだろう。
$ embulk preview config.yml
2018-02-13 04:54:56.513 +0000: Embulk v0.9.2
********************************** INFORMATION **********************************
Join us! Embulk-announce mailing list is up for IMPORTANT announcement such as
compatibility-breaking changes and key feature updates.
https://groups.google.com/forum/#!forum/embulk-announce
*********************************************************************************
2018-02-13 04:55:00.411 +0000 [INFO] (main): Gem's home and path are set by default: "/root/.embulk/lib/gems"
2018-02-13 04:55:01.312 +0000 [INFO] (main): Started Embulk v0.9.2
2018-02-13 04:55:01.404 +0000 [INFO] (0001:preview): Listing local files at directory '/work/example/json' filtering filename by prefix 'tripadvisor_'
2018-02-13 04:55:01.408 +0000 [INFO] (0001:preview): "follow_symlinks" is set false. Note that symbolic links to directories are skipped.
2018-02-13 04:55:01.431 +0000 [INFO] (0001:preview): Loading files [/work/example/json/tripadvisor_uji.json, /work/example/json/tripadvisor_uji.json~, /work/example/json/tripadvisor_uji.json.jq]
2018-02-13 04:55:01.456 +0000 [INFO] (0001:preview): Try to read 32,768 bytes from input source
2018-02-13 04:55:01.855 +0000 [INFO] (0001:preview): Loaded plugin embulk-parser-json (0.0.7)
2018-02-13 04:55:01.870 +0000 [WARN] (0001:preview): 'embulk-parser-json' has been deprecated.
2018-02-13 04:55:01.870 +0000 [WARN] (0001:preview): Just use 'embulk-parser-jsonpath' (https://rubygems.org/gems/embulk-parser-jsonpath) instead.
+------------------------------------------------------------------------------------------------------------------------+-----------------+-------------+-------------+-------------+--------------------------------------------------------------------------+--------------+--------------+--------------+--------------+--------------+-----------------------+-------------------------+--------------------+-------------------+-------------------+------------+------------+---------------+
| detail_url:string | title:string | rate:string | review:long | part:string | tags:string | rating5:long | rating4:long | rating3:long | rating2:long | rating1:long | street_address:string | address_locality:string | postal_code:string | place_id_g:string | place_id_d:string | lng:double | lat:double | images:string |
+------------------------------------------------------------------------------------------------------------------------+-----------------+-------------+-------------+-------------+--------------------------------------------------------------------------+--------------+--------------+--------------+--------------+--------------+-----------------------+-------------------------+--------------------+-------------------+-------------------+------------+------------+---------------+
| https://www.tripadvisor.com/Attraction_Review-g946495-d1867744-Reviews-Sawarabi_Street-Uji_Kyoto_Prefecture_Kinki.html | Sawarabi Street | 4.0 | 32 | | Points of Interest & Landmarks,Historic Walking Areas,Sights & Landmarks | 5 | 15 | 12 | 0 | 0 | Uji | Uji, | | g946495 | d1867744 | 135.78813 | 34.88997 | |
+------------------------------------------------------------------------------------------------------------------------+-----------------+-------------+-------------+-------------+--------------------------------------------------------------------------+--------------+--------------+--------------+--------------+--------------+-----------------------+-------------------------+--------------------+-------------------+-------------------+------------+------------+---------------+
image:string
が取れていないので調べる。
typoがあった。
画像パスが imgaes
になってる。
なおしたらちゃんとURLとれた。
Embulkが動くDockerfileを作る