wfau / gaia-dmp

Gaia data analysis platform
GNU General Public License v3.0
1 stars 5 forks source link

Set default filesystem to file:// #854

Open Zarquan opened 2 years ago

Zarquan commented 2 years ago

Spark dataframes should default to writing to the file:// file-system rather than the hdfs:// file-system. We also need a PASS/FAIL test notebook that checks this is working correctly.

Zarquan commented 2 years ago

I think the setting is here: https://github.com/wfau/aglais/blob/master/deployments/hadoop-yarn/ansible/12-config-hadoop-core.yml#L53-L67

            <property>
                <name>fs.default.name</name>
                <value>hdfs://{{hdhost}}:9000</value>
            </property>

Also, need to check if we are using the right property, because fs.default.name is deprecated. replaced by fs.defaultFS. https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/DeprecatedProperties.html