lqshow / notes

Cheat Sheet
10 stars 2 forks source link

Hadoop HDFS JAVA API Basic Usage #22

Open lqshow opened 6 years ago

lqshow commented 6 years ago

Configuration

import

import org.apache.hadoop.conf.Configuration;
Configuration conf = new Configuration();

// change fs.defaultFs
conf.set("fs.defaultFS", "hdfs://localhost:9000");

FileSystem

FileSystem用于获取文件系统对象

import

import org.apache.hadoop.fs.FileSystem;
method desc
get(Configuration conf) 根据conf获取具体的文件系统对象
get(URI uri, Configuration conf) 基于uri和conf创建文件系统对象
get(URI uri, Configuration conf, String user) 基于uri,conf和user获取文件系统对象
getLocal(Configuration conf) 获取本地文件系统

Get FileSystem

Configuration conf = new Configuration();  

FileSystem local = FileSystem.getLocal(conf);
FileSystem hdfs = FileSystem.get(URI.create("hdfs://localhost:9000"), conf);

// IDE将conf目录设置成ResourceRoot,可直接调试
FileSystem fs = FileSystem.get(conf);

Get home directory

Path homeDir = fs.getHomeDirectory();

//output: hdfs://localhost:9000/user/linqiong

Get Working directory

Path workingDir = fs.getWorkingDirectory();

//output: hdfs://localhost:9000/user/linqiong

Create a blank file in hdfs

fs.createNewFile(new Path("output/newFile"));

Result:

(py2dev) ➜  /Users/linqiong/Downloads hadoop fs -ls output
Found 2 items
-rw-r--r--   1 linqiong supergroup          5 2017-12-03 14:50 output/file2.txt
-rw-r--r--   1 linqiong supergroup          0 2017-12-03 15:23 output/newFile

FileStatus

用于获取文件系统的元数据

import

import org.apache.hadoop.fs.FileStatus;

Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
Path path = new Path("folder");
FileStatus fss = fs.getFileStatus(path);

Creating File in HDFS

Writing data to hdfs file

/**
 * Writing data to hdfs file
 * @param content   raw data
 * @param dstPath   Destination file in HDFS
 * @throws IOException
 */
public void writingData2HdfsFile(String content, String dstPath) throws IOException {
  // Get configuration of Hadoop system
  Configuration conf = new Configuration();
  try (
    FileSystem fs = FileSystem.get(conf);
    FSDataOutputStream fsos = fs.create(new Path(dstPath))
  ) {
    byte[] buffer = content.getBytes();
    fsos.write(buffer, 0, buffer.length);
  }
}

Writing a file to Hdfs file

/**
 *  Writing a file to Hdfs
 * @param localSrc Source file in the local file system
 * @param hdfsDst Destination file in HDFS
 * @throws IOException
 */
public void srcFile2HdfsFile(String localSrc, String hdfsDst) throws IOException {

  // Input stream for the file in local file system to be written to HDFS
  InputStream in = new BufferedInputStream(new FileInputStream(localSrc));

  // Get configuration of Hadoop system
  Configuration conf = new Configuration();
  System.out.println("Connecting to -- " + conf.get("fs.defaultFS"));

  // Destination file in HDFS
  FileSystem fs = FileSystem.get(URI.create(hdfsDst), conf);
  OutputStream out = fs.create(new Path(hdfsDst));

  //Copy file from local to HDFS
  IOUtils.copyBytes(in, out, 4096, true);

  System.out.println(hdfsDst + " copied to HDFS");
}

Writing a Source hdfs File to Destination hdfs file

/**
 * 
 * @param srcPath   Source file in HDFS
 * @param dstPath   Destination file in HDFS
 * @throws IOException
 */
public void srcHdfsFile2DstHdfsFile(String srcPath, String dstPath) throws IOException {
  Configuration conf = new Configuration();
  try (
    FileSystem fs = FileSystem.get(conf);

    // Source file in HDFS
    FSDataInputStream fis = fs.open(new Path(srcPath));
    BufferedReader br = new BufferedReader(new InputStreamReader(fis, "UTF-8"));

    // Destination file in HDFS
    FSDataOutputStream fos = fs.create(new Path(dstPath));
    BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(fos, Charset.forName("GBK")))
  ) {
    String line;
    while ((line = br.readLine()) != null) {
      bw.append(line).append(System.getProperty("line.separator"));
    }
  }
}

References