oracle / ocifs

ocifs provides a POSIX-compatible API wrapping Oracle Cloud Infrastructure's (OCI) Object Storage. ocifs is a python library that relies on the fsspec framework.
https://ocifs.readthedocs.io/en/latest/
Universal Permissive License v1.0
17 stars 9 forks source link
cloud filesystem fsspec object-storage oci oracle python3

Oracle Cloud Infrastructure Object Storage fsspec Implementation

PyPI Python

​ The Oracle Cloud Infrastructure Object Storage service is an internet-scale, high-performance storage platform that offers reliable and cost-efficient data durability. With Object Storage, you can safely and securely store or retrieve data directly from the internet or from within the cloud platform. ​ ocifs is part of the fsspec intake/filesystem_spec ecosystem

a template or specification for a file-system interface, that specific implementations should follow, so that applications making use of them can rely on a common interface and not have to worry about the specific internal implementation decisions with any given backend. ​ ocifs joins the list of file systems supported with this package. ​ The intake/filesystem_spec project is used by Pandas, Dask and other data libraries in python, this package adds Oracle OCI Object Storage capabilties to these libraries. ​

OCIFS file system style operations Example:


from ocifs import OCIFileSystem

fs = OCIFilesystem("~/.oci/config")

1.Create empty file or truncate in OCI objectstorage bucket

fs.touch("oci://my_bucket>@<my_namespace//hello.txt", truncate=True, data=b"Writing to Object Storage!")

2.Fetch(potentially multiple paths' contents

fs.cat("oci://my_bucket>@<my_namespace//hello.txt")

3.Get metadata about a file from a head or list call

fs.info("oci://my_bucket>@<my_namespace//hello.txt")

4.Get directory listing page

fs.ls("oci://my_bucket>@<my_namespace//", detail=True)

5.Is this entry directory-like?

fs.isdir("oci://my_bucket>@<my_namespace")

6.Is this entry file-like?

fs.isfile("oci://my_bucket>@<my_namespace//hello.txt")

7.If there is a file at the given path (including broken links)

fs.lexists("oci://my_bucket>@<my_namespace//hello.txt")

8.List of files for the given path

fs.listdir("oci://my_bucket>@<my_namespace/", detail=True)

9.Get the first size bytes from file

fs.head("oci://my_bucket>@<my_namespace//hello.txt", size=1024)

10.Get the last size bytes from file

fs.tail("oci://my_bucket>@<my_namespace//hello.txt", size=1024)

11.Hash of file properties, to tell if it has changed

fs.ukey("oci://my_bucket>@<my_namespace//hello.txt")

12.Size in bytes of file

fs.size("oci://my_bucket>@<my_namespace//hello.txt")

13.Size in bytes of each file in a list of paths

paths = ["oci://my_bucket>@<my_namespace//hello.txt"] fs.sizes(paths)

14.Normalise OCI path string into bucket and key.

fs.split_path("oci://my_bucket>@<my_namespace//hello.txt")

15.Delete a file from the bucket

fs.rm("oci://my_bucket>@<my_namespace//hello.txt")

16.Get the contents of the file as a byte

fs.read_bytes("oci://my_bucket>@<my_namespace//hello.txt", start=0, end=13)

17.Get the contents of the file as a string

fs.read_text("oci://my_bucket>@<my_namespace//hello.txt", encoding=None, errors=None, newline=None)

18.Get the contents of the file as a byte

fs.read_block("oci://my_bucket>@<my_namespace//hello.txt", 0, 13)

19.Open a file for writing/flushing into file in OCI objectstorage bucket

Ocifs sets the best-guessed content-type for hello.txt i.e "text/plain"

with fs.open("oci://my_bucket>@<my_namespace//hello.txt", 'w', autocommit=True) as f: f.write("Writing data to buffer, before manually flushing and closing.") # data is flushed and file closed f.flush()

Ocifs uses the specified content-type passed in the open while writing to OCI objectstorage bucket

with fs.open("oci://my_bucket>@<my_namespace//hello.txt", 'w',content_type='text/plain') as f: f.write("Writing data to buffer, before manually flushing and closing.") # data is flushed and file closed f.flush()

20.Open a file for reading a file from OCI objectstorage bucket

with fs.open("oci://my_bucket>@<my_namespace//hello.txt") as f: print(f.read())

21.Space used by files and optionally directories within a path

fs.du("oci://my_bucket>@<my_namespace//hello10.csv")

22.Find files by glob-matching.

fs.glob("oci://my_bucket>@<my_namespace//*.txt")

23.Renames an object in a particular bucket in tenancy namespace on OCI

fs.rename("oci://my_bucket>@<my_namespace//hello.txt", "oci://my_bucket>@<my_namespace//hello2.txt")

24.Delete multiple files from the same bucket

pathlist = ["oci://my_bucket>@<my_namespace//hello2.txt"] fs.bulk_delete(pathlist)


### Or Use With Pandas
​
```python
import pandas as pd
import ocifs
​
df = pd.read_csv(
    "oci://my_bucket@my_namespace/my_object.csv",
    storage_options={"config": "~/.oci/config"},
)

Or Use With PyArrow

import pandas as pd
import ocifs
​
df = pd.read_csv(
    "oci://my_bucket@my_namespace/my_object.csv",storage_options={"config": "~/.oci/config"})

Or Use With ADSDataset

import ads
import pandas as pd
from ads.common.auth import default_signer
from ads.dataset.dataset import ADSDataset

​
    ads.set_auth(auth="api_key", oci_config_location="~/.oci/config", profile="<profile_name>")
    ds = ADSDataset(
        df=pd.read_csv(f"oci://my_bucket@my_namespace/my_object.csv", storage_options=default_signer()),
        type_discovery=False
    )
    print(ds.df)

Getting Started

python3 -m pip install ocifs

Software Prerequisites

Python >= 3.6

Environment Variables for Authentication:

export OCIFS_IAM_TYPE=api_key
export OCIFS_CONFIG_LOCATION=~/.oci/config
export OCIFS_CONFIG_PROFILE=DEFAULT

Note, if you are operating on OCI with an alternative valid signer, such as resource principal, instead set the following:

export OCIFS_IAM_TYPE=resource_principal

Environment Variables for enabling Logging:

To quickly see all messages, you can set the environment variable OCIFS_LOGGING_LEVEL=DEBUG.

export OCIFS_LOGGING_LEVEL=DEBUG

Documentation

Support

The built-in filesystems in fsspec are maintained by the intake project team, where as ocifs is an external implementation (similar to s3fs, gcsfs, adl/abfs, and so on), which is maintained by Oracle.

Contributing

This project welcomes contributions from the community. Before submitting a pull request, please review our contribution guide

Security

Please consult the security guide for our responsible security vulnerability disclosure process

License

Copyright (c) 2021, 2023 Oracle and/or its affiliates.

Released under the Universal Permissive License v1.0 as shown at https://oss.oracle.com/licenses/upl/.