yago-naga / yago4

Yago 4 - the next version of Yago
https://yago-knowledge.org/downloads/yago-4
GNU General Public License v3.0
90 stars 15 forks source link

Running Yago4 code still don't give same results as expected #7

Open maxpoulain opened 3 years ago

maxpoulain commented 3 years ago

Hi,

I have a new question/issue regarding the code of yago4.

I ran the code on the latest wikipedia dump following the README and I have a problem when I ran cargo run --release -- -c wd-preprocessed.db build -o yago4 --full:

It's writing files but the code never ends and does not write the stats file, It seems stuck.. and we can let it run for hours it doesn't change a thing but there is no error.. And also when I look at the size of the files some are much smaller than those present here as you can see:

1140834041 3,0M -rw-r--r-- 1 root root 3,0M 23 sept. 11:26 yago-wd-annotated-facts.ntx.gz
1140834036 9,9M -rw-r--r-- 1 root root 9,9M 23 sept. 10:56 yago-wd-class.nt.gz
1140834040 156M -rw-r--r-- 1 root root 156M 23 sept. 11:26 yago-wd-facts.nt.gz
1140834035 3,1G -rw-r--r-- 1 root root 3,1G 23 sept. 10:32 yago-wd-full-types.nt.gz
1140834034  26G -rw-r--r-- 1 root root  26G 23 sept. 12:01 yago-wd-labels.nt.gz
1140834039 4,1G -rw-r--r-- 1 root root 4,1G 23 sept. 10:39 yago-wd-sameAs.nt.gz
1140834038  48K -rw-r--r-- 1 root root  45K 23 sept. 10:26 yago-wd-schema.nt.gz
1140834037  16K -rw-r--r-- 1 root root  16K 23 sept. 10:26 yago-wd-shapes.nt.gz
1140834033 9,6G -rw-r--r-- 1 root root 9,6G 23 sept. 10:38 yago-wd-simple-types.nt.gz

In fact, it's our yago-wd-facts.nt.gz and yago-wd-annotated-facts.ntx.gz that are really really smaller than expected.

On the first step of Yago cargo run --release -- -c wd-preprocessed.db partition -f latest-all.nt.gz I have preprocessed.db that is around 422 Go, so I think that is good.

Here are the logs I have:

Updating crates.io index
  Downloaded nom v5.1.2
  Downloaded synstructure v0.12.4
  Downloaded rustc-hash v1.1.0
  Downloaded percent-encoding v2.1.0
  Downloaded textwrap v0.11.0
  Downloaded rustc-demangle v0.1.16
  Downloaded rio_api v0.5.0
  Downloaded ansi_term v0.11.0
  Downloaded memoffset v0.5.6
  Downloaded crossbeam-channel v0.4.4
  Downloaded backtrace v0.3.50
  Downloaded humantime v1.3.0
  Downloaded object v0.20.0
  Downloaded regex-syntax v0.6.18
  Downloaded cc v1.0.60
  Downloaded gimli v0.22.0
  Downloaded crossbeam-epoch v0.8.2
  Downloaded glob v0.3.0
  Downloaded libloading v0.5.2
  Downloaded idna v0.2.0
  Downloaded scopeguard v1.1.0
  Downloaded tinyvec v0.3.4
  Downloaded libc v0.2.77
  Downloaded crossbeam v0.7.3
  Downloaded unicode-bidi v0.3.4
  Downloaded lazy_static v1.4.0
  Downloaded cexpr v0.4.0
  Downloaded autocfg v1.0.1
  Downloaded crossbeam-deque v0.7.3
  Downloaded log v0.4.11
  Downloaded crossbeam-utils v0.7.2
  Downloaded num-traits v0.2.12
  Downloaded byteorder v1.3.4
  Downloaded strsim v0.8.0
  Downloaded crc32fast v1.2.0
  Downloaded addr2line v0.13.0
  Downloaded clang-sys v0.29.3
  Downloaded rocksdb v0.15.0
  Downloaded adler v0.2.3
  Downloaded quote v1.0.7
  Downloaded memchr v2.3.3
  Downloaded regex v1.3.9
  Downloaded jobserver v0.1.21
  Downloaded peeking_take_while v0.1.2
  Downloaded shlex v0.1.1
  Downloaded clap v2.33.3
  Downloaded unicode-normalization v0.1.13
  Downloaded cloudflare-zlib-sys v0.2.0
  Downloaded bindgen v0.54.0
  Downloaded version_check v0.9.2
  Downloaded vec_map v0.8.2
  Downloaded miniz_oxide v0.4.2
  Downloaded which v3.1.1
  Downloaded url v2.1.1
  Downloaded syn v1.0.41
  Downloaded thread_local v1.0.1
  Downloaded unicode-width v0.1.8
  Downloaded maybe-uninit v2.0.0
  Downloaded quick-error v1.2.3
  Downloaded proc-macro2 v1.0.21
  Downloaded matches v0.1.8
  Downloaded failure v0.1.8
  Downloaded unicode-xid v0.2.1
  Downloaded time v0.1.44
  Downloaded termcolor v1.1.0
  Downloaded flate2 v1.0.17
  Downloaded env_logger v0.7.1
  Downloaded crossbeam-queue v0.2.3
  Downloaded num-integer v0.1.43
  Downloaded bitflags v1.2.1
  Downloaded chrono v0.4.15
  Downloaded atty v0.2.14
  Downloaded aho-corasick v0.7.13
  Downloaded cfg-if v0.1.10
  Downloaded failure_derive v0.1.8
  Downloaded lazycell v1.3.0
  Downloaded librocksdb-sys v6.11.4
  Downloaded oxiri v0.1.1
  Downloaded rio_turtle v0.5.0
  Downloaded oxilangtag v0.1.1
  Downloaded 80 crates (15.0 MB) in 1.18s (largest was `librocksdb-sys` at 10.0 MB)
   Compiling libc v0.2.77
   Compiling autocfg v1.0.1
   Compiling cfg-if v0.1.10
   Compiling lazy_static v1.4.0
   Compiling proc-macro2 v1.0.21
   Compiling memchr v2.3.3
   Compiling unicode-xid v0.2.1
   Compiling version_check v0.9.2
   Compiling glob v0.3.0
   Compiling bitflags v1.2.1
   Compiling log v0.4.11
   Compiling unicode-width v0.1.8
   Compiling maybe-uninit v2.0.0
   Compiling quick-error v1.2.3
   Compiling regex-syntax v0.6.18
   Compiling termcolor v1.1.0
   Compiling strsim v0.8.0
   Compiling vec_map v0.8.2
   Compiling ansi_term v0.11.0
   Compiling bindgen v0.54.0
   Compiling rustc-hash v1.1.0
   Compiling shlex v0.1.1
   Compiling syn v1.0.41
   Compiling lazycell v1.3.0
   Compiling peeking_take_while v0.1.2
   Compiling crc32fast v1.2.0
   Compiling matches v0.1.8
   Compiling adler v0.2.3
   Compiling tinyvec v0.3.4
   Compiling scopeguard v1.1.0
   Compiling failure_derive v0.1.8
   Compiling gimli v0.22.0
   Compiling rustc-demangle v0.1.16
   Compiling object v0.20.0
   Compiling byteorder v1.3.4
   Compiling oxiri v0.1.1
   Compiling oxilangtag v0.1.1
   Compiling rio_api v0.5.0
   Compiling percent-encoding v2.1.0
   Compiling unicode-bidi v0.3.4
   Compiling humantime v1.3.0
   Compiling textwrap v0.11.0
   Compiling thread_local v1.0.1
   Compiling rio_turtle v0.5.0
   Compiling unicode-normalization v0.1.13
   Compiling aho-corasick v0.7.13
   Compiling nom v5.1.2
   Compiling quote v1.0.7
   Compiling crossbeam-utils v0.7.2
   Compiling memoffset v0.5.6
   Compiling miniz_oxide v0.4.2
   Compiling crossbeam-epoch v0.8.2
   Compiling num-traits v0.2.12
   Compiling num-integer v0.1.43
   Compiling idna v0.2.0
   Compiling clang-sys v0.29.3
   Compiling jobserver v0.1.21
   Compiling atty v0.2.14
   Compiling which v3.1.1
   Compiling time v0.1.44
   Compiling clap v2.33.3
   Compiling cc v1.0.60
   Compiling crossbeam-channel v0.4.4
   Compiling crossbeam-queue v0.2.3
   Compiling url v2.1.1
   Compiling crossbeam-deque v0.7.3
   Compiling regex v1.3.9
   Compiling crossbeam v0.7.3
   Compiling chrono v0.4.15
   Compiling addr2line v0.13.0
   Compiling cexpr v0.4.0
   Compiling env_logger v0.7.1
   Compiling backtrace v0.3.50
   Compiling synstructure v0.12.4
   Compiling libloading v0.5.2
   Compiling cloudflare-zlib-sys v0.2.0
   Compiling flate2 v1.0.17
   Compiling failure v0.1.8
   Compiling librocksdb-sys v6.11.4
   Compiling rocksdb v0.15.0
   Compiling yago4 v0.0.0 (/yago4)
    Finished release [optimized] target(s) in 2m 09s
     Running `target/release/yago4 -c /home/ec2-user/wd-preprocessed.db build -o /home/ec2-user/yago4 --full`
Generating Wikidata to Yago URI mapping
Considering all Wikidata items
Generating Yago class set
Generating Yago subClassOf relations
Generating Wikidata to Yago class mapping
Generating the list of instances for each shape
Writing file yago-wd-simple-types.nt.gz
Writing file yago-wd-labels.nt.gz
Writing file yago-wd-full-types.nt.gz
Writing file yago-wd-class.nt.gz
Writing file yago-wd-shapes.nt.gz
Writing file yago-wd-schema.nt.gz
Writing file yago-wd-sameAs.nt.gz
Writing file yago-wd-facts.nt.gz
Writing file yago-wd-annotated-facts.ntx.gz

So, I know that I don't have a lot of elements to find where is the problem but I would like to understand, if I'm doing something wrong or if there is a bug or an error somewhere..

Thanks for your help.

Maxime

Vikas-Jethwani commented 3 years ago

@maxpoulain Did you resolve the issue? I have the same issue.

maxpoulain commented 3 years ago

No I did not solve the problem.

@Tpt Do you have the same error/bug ?

vasoto commented 3 years ago

I ran into the same or very similar issue. The yago4 process consumes the whole available memory (in my case 128Gb) and exits quietly. The error could be found only in the /var/log/syslog in Ubuntu - the message is something like this: yago4 invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0

@maxpoulain , @Vikas-Jethwani, could you guys confirm that you are experiencing the same problem or it is another issue?

@Tpt could you please share the configuration (RAM, CPU. OS) used to generate Yago4?