nspcc-dev / locode-db

Source of UN/LOCODE database generated by NeoFS CLI.
MIT License
3 stars 6 forks source link

Optimize unpacked DB #24

Closed roman-khimov closed 9 months ago

roman-khimov commented 10 months ago

Is your feature request related to a problem? Please describe.

I'm always frustrated when we're wasting memory for nothing. We have a map now with a lot of elements and each element adds a substantial overhead, comparable with the element data size (a string is 16 bytes even if the contents is 3 bytes, a []byte is 24).

Describe the solution you'd like

Unpack the text file as is and implement binary search over the plaintext, convert the result as needed. It's sorted, so this can be done easily.

Describe alternatives you've considered

Keep wasting memory for nothing, no.

roman-khimov commented 10 months ago

~6 vs ~18 MB this way.

roman-khimov commented 10 months ago

Just for the record, searching over the plaintext can be done with

-       locodeCSV, found := mLocodes[locodeStr]
-       if !found {
+       loc := sort.Search(len(locodesData), func(c int) bool {
+               if locodesData[c] == '\n' && c != 0 {
+                       c--
+               }
+               str := bytes.LastIndexByte(locodesData[:c], '\n')
+               if str == -1 {
+                       str = 0
+               } else {
+                       str++
+               }
+               cmp := bytes.Compare(locodesData[str:str+len(locodeStr)], []byte(locodeStr))
+               return cmp >= 0
+       })
+       if loc == len(locodesData) || bytes.Compare([]byte(locodeStr), locodesData[loc:loc+len(locodeStr)]) != 0 {
                return Record{}, ErrNotFound
        }
+       reader := csv.NewReader(bytes.NewReader(locodesData[loc:]))
+       record, err := reader.Read()
+       if err != nil {
+               return Record{}, err
+       }
+
+       cont, _ := strconv.ParseUint(record[3], 10, 8)
+       var continent = Continent(uint8(cont))
+
+       lat, err := strconv.ParseFloat(record[6], 64)
+       if err != nil {
+               return Record{}, err
+       }
+       lon, err := strconv.ParseFloat(record[7], 64)
+       if err != nil {
+               return Record{}, err
+       }

and appropriate other changes, but it's less memory-efficient than #28 and is really slow on Get() (~20 times worse than #28).